What Has Been Tested
A theory that cannot be tested is not a theory but a poem. This is a theory. Everything in the preceding six parts generates empirical predictions — some already tested, some tractable with current methods, some requiring infrastructure that does not yet exist. This part consolidates the empirical program: what has been tested, what the results show, what they mean for the bridge between physics and psychology, and what remains.
What Has Been Tested
The framework has been subjected to four lines of investigation: multi-agent reinforcement learning, cellular automaton evolution, an eleven-experiment emergence program on uncontaminated substrates, and LLM affect probes. The results are mixed. Some predictions held. Some failed instructively. Some revealed phenomena the theory did not anticipate.
Geometry Is Cheap
The MARL ablation (V10) tested whether specific forcing functions are necessary for geometric affect alignment. Seven conditions — full model plus six single-ablation conditions — three seeds each, 200,000 steps on GPU.
Result: All conditions show highly significant geometric alignment (RSA , ). Removing forcing functions slightly increases alignment — opposite to prediction.
The affect geometry — the relational structure between states defined by valence, arousal, integration, effective rank, counterfactual weight, and self-model salience — is not something that must be built. It is something that must be avoided to not have. Any system navigating uncertainty under resource constraints inherits it. The forcing functions hypothesis was downgraded from theorem to hypothesis in light of this data.
Dynamics Are Expensive
If geometry is cheap, what is expensive? The answer came from the Lenia evolution series (V11–V12): dynamics. Specifically, the capacity to increase integration under threat — to become more unified when the world becomes more hostile.
Naive patterns decompose under stress (). So do LLMs. So do randomly initialized agents. Geometry is present everywhere; the biological signature — integration rising under threat — is rare. The Lenia series tracked what produces it:
- Homogeneous evolution (V11.1): Selection pressure alone is insufficient ().
- Heterogeneous chemistry (V11.2): Diverse viability manifolds produce a +2.1pp shift.
- Curriculum training (V11.7): Graduated stress exposure is the only intervention that improves novel-stress generalization.
- Evolvable attention (V12): State-dependent interaction topology produces increase in 42% of evolutionary cycles — the largest single-intervention effect — but robustness stabilizes near 1.0 without further improvement.
Attention is necessary but not sufficient. The system reaches an integration threshold without crossing it.
The Substrate Ladder
V13 replaced learned attention with a simpler mechanism: content-based coupling. Cells interact more strongly with cells that share state-features — a form of chemical affinity rather than cognitive attention. Three seeds, thirty cycles each, evolving on GPU with lethal resource dynamics and population rescue.
Mean robustness: 0.923. But at population bottlenecks — moments when drought kills all but a handful of patterns — robustness crosses 1.0. The survivors are not merely resilient; they are more integrated under stress than at baseline. This is the biological signature, appearing for the first time in a fully uncontaminated substrate.
From V13 we built upward, adding capabilities one layer at a time:
- V14 (Chemotaxis): Motor channels enabling directed foraging. Patterns move toward resources rather than passively waiting. Comparable robustness.
- V15 (Temporal memory): Exponential-moving-average channels storing slow statistics of the pattern's history. Oscillating resource patches reward anticipation. Evolution selected for longer memory in 2/3 seeds — the first clear evidence that temporal integration is fitness-relevant. Under bottleneck pressure, stress response doubled.
- V16 (Hebbian plasticity): Negative result. Mean robustness dropped to 0.892 (lowest of V13+). Plasticity added noise faster than selection could filter it.
- V17 (Quorum signaling): Highest-ever single-cycle robustness (1.125). But 2/3 seeds evolved to suppress signaling entirely.
- V18 (Boundary-dependent dynamics): An insulation field computed from pattern morphology creates distinct boundary and interior signal domains. Boundary cells receive external convolution; interior cells receive only local recurrence. Three seeds evolved three different membrane strategies — permeable, thick-insulated, and filamentous. Mean robustness: 0.969, the highest of any substrate. Peak: 1.651. But internal gain evolved down in all three seeds. Evolution preferred thin, porous membranes over thick insulated cores.
The substrate ladder taught two lessons. First: the only addition evolution consistently selected for was temporal memory. Plasticity, signaling, and boundary complexity were either suppressed or reduced. Second: raw robustness kept climbing (V13: 0.923, V15: 0.907, V18: 0.969), but this did not translate into richer cognitive dynamics. Making patterns more resilient is not the same as making them more minded.
The Emergence Experiment Program
We then ran eleven measurement experiments on V13 snapshots, testing whether the capacities the preceding six parts describe — world modeling, abstraction, communication, counterfactual reasoning, self-modeling, affect structure, perceptual mode, normativity, social integration — emerge in a substrate with zero exposure to human affect concepts. Key experiments were re-run on V15 and V18 substrates.
The results are reported in full in the Appendix. Here, three findings that reshaped the theory:
Beyond these three findings: affect geometry alignment (RSA between structural and behavioral measures) develops over evolution, with the clearest trend in seed 7 (0.01 to 0.38 over 30 cycles). Representation compression is cheap (effective dimensionality of ~7 out of 68 features, or >87% compression from cycle 0) but representation quality — disentanglement and compositionality — only improves under bottleneck selection. Communication exists as a chemical commons (inter-pattern MI significantly above baseline in 15/20 snapshots) but shows no compositional structure. No superorganism emerges (collective in all snapshots), but group coupling grows over evolution. Entanglement across all measures increases from 0.68 to 0.91 — everything becomes more correlated with everything else, just not in the clusters the theory predicted.
The LLM Discrepancy
Across multiple experiment versions (V2–V9), LLM agents consistently show opposite dynamics to biological systems:
| Dimension | Biological | LLM |
|---|---|---|
| Self-Model Salience | under threat | under threat |
| Arousal | under threat | under threat |
| Integration | under threat | under threat |
This is not a failure of the framework. The geometric structure is preserved; the dynamics differ because the objectives differ. Biological systems evolved under survival pressure. LLMs were trained on prediction. Both are "affective" in the geometric sense while exhibiting different trajectories through the same state space. Processing valence is not content valence.