Part VII: Empirical Program

What Has Been Tested

A theory that cannot be tested is not a theory but a poem. This is a theory. Everything in the preceding six parts generates empirical predictions — some already tested, some tractable with current methods, some requiring infrastructure that does not yet exist. This part consolidates the empirical program: what has been tested, what the results show, what they mean for the bridge between physics and psychology, and what remains.

Loading 3D visualization...

What Has Been Tested

The framework has been subjected to four lines of investigation: multi-agent reinforcement learning, cellular automaton evolution, an eleven-experiment emergence program on uncontaminated substrates, and LLM affect probes. The results are mixed. Some predictions held. Some failed instructively. Some revealed phenomena the theory did not anticipate.

Geometry Is Cheap

The MARL ablation (V10) tested whether specific forcing functions are necessary for geometric affect alignment. Seven conditions — full model plus six single-ablation conditions — three seeds each, 200,000 steps on GPU.

Result: All conditions show highly significant geometric alignment (RSA $\rho > 0.21$ , $p < 0.0001$ ). Removing forcing functions slightly increases alignment — opposite to prediction.

The affect geometry — the relational structure between states defined by valence, arousal, integration, effective rank, counterfactual weight, and self-model salience — is not something that must be built. It is something that must be avoided to not have. Any system navigating uncertainty under resource constraints inherits it. The forcing functions hypothesis was downgraded from theorem to hypothesis in light of this data.

Dynamics Are Expensive

If geometry is cheap, what is expensive? The answer came from the Lenia evolution series (V11–V12): dynamics. Specifically, the capacity to increase integration under threat — to become more unified when the world becomes more hostile.

Naive patterns decompose under stress ( $\Delta\intinfo = -6.2\%$ ). So do LLMs. So do randomly initialized agents. Geometry is present everywhere; the biological signature — integration rising under threat — is rare. The Lenia series tracked what produces it:

Homogeneous evolution (V11.1): Selection pressure alone is insufficient ( $-6.0\%$ ).
Heterogeneous chemistry (V11.2): Diverse viability manifolds produce a +2.1pp shift.
Curriculum training (V11.7): Graduated stress exposure is the only intervention that improves novel-stress generalization.
Evolvable attention (V12): State-dependent interaction topology produces $\intinfo$ increase in 42% of evolutionary cycles — the largest single-intervention effect — but robustness stabilizes near 1.0 without further improvement.

Attention is necessary but not sufficient. The system reaches an integration threshold without crossing it.

The Substrate Ladder

V13 replaced learned attention with a simpler mechanism: content-based coupling. Cells interact more strongly with cells that share state-features — a form of chemical affinity rather than cognitive attention. Three seeds, thirty cycles each, evolving on GPU with lethal resource dynamics and population rescue.

Mean robustness: 0.923. But at population bottlenecks — moments when drought kills all but a handful of patterns — robustness crosses 1.0. The survivors are not merely resilient; they are more integrated under stress than at baseline. This is the biological signature, appearing for the first time in a fully uncontaminated substrate.

From V13 we built upward, adding capabilities one layer at a time:

V14 (Chemotaxis): Motor channels enabling directed foraging. Patterns move toward resources rather than passively waiting. Comparable robustness.
V15 (Temporal memory): Exponential-moving-average channels storing slow statistics of the pattern's history. Oscillating resource patches reward anticipation. Evolution selected for longer memory in 2/3 seeds — the first clear evidence that temporal integration is fitness-relevant. Under bottleneck pressure, $\intinfo$ stress response doubled.
V16 (Hebbian plasticity): Negative result. Mean robustness dropped to 0.892 (lowest of V13+). Plasticity added noise faster than selection could filter it.
V17 (Quorum signaling): Highest-ever single-cycle robustness (1.125). But 2/3 seeds evolved to suppress signaling entirely.
V18 (Boundary-dependent dynamics): An insulation field computed from pattern morphology creates distinct boundary and interior signal domains. Boundary cells receive external convolution; interior cells receive only local recurrence. Three seeds evolved three different membrane strategies — permeable, thick-insulated, and filamentous. Mean robustness: 0.969, the highest of any substrate. Peak: 1.651. But internal gain evolved down in all three seeds. Evolution preferred thin, porous membranes over thick insulated cores.

The substrate ladder taught two lessons. First: the only addition evolution consistently selected for was temporal memory. Plasticity, signaling, and boundary complexity were either suppressed or reduced. Second: raw robustness kept climbing (V13: 0.923, V15: 0.907, V18: 0.969), but this did not translate into richer cognitive dynamics. Making patterns more resilient is not the same as making them more minded.

The Emergence Experiment Program

We then ran eleven measurement experiments on V13 snapshots, testing whether the capacities the preceding six parts describe — world modeling, abstraction, communication, counterfactual reasoning, self-modeling, affect structure, perceptual mode, normativity, social integration — emerge in a substrate with zero exposure to human affect concepts. Key experiments were re-run on V15 and V18 substrates.

The results are reported in full in the Appendix. Here, three findings that reshaped the theory:

Finding 1: The Bottleneck Furnace

Every metric that showed improvement — world model capacity, representation quality, affect geometry alignment, self-model salience — showed it overwhelmingly at population bottlenecks. When drought kills 90% of patterns, the survivors are not random. They are the ones whose internal structure actively maintains integration under stress.

The bottleneck is not just a filter. It is a furnace. V13 seed 123 at cycle 5: population drops to 55, robustness crosses 1.052. At cycle 29 (population 24): world model capacity jumps to 0.028, roughly 100x the population average. One surviving pattern achieves self-model salience above 1.0 — privileged self-knowledge exceeding environment-knowledge.

These are not gradual evolutionary trends. They are punctuated events driven by intense selection pressure. The biological dynamics emerge not from accumulated innovation but from crucibles of near-extinction.

V19 confirmed this is creation, not selection. After ten cycles of shared evolution on V18 substrate, patterns were forked into three conditions: BOTTLENECK (two severe 8%-regen droughts per cycle, ~90% mortality), GRADUAL (mild continuous stress), and CONTROL (standard schedule). All three then faced identical novel extreme drought. Controlling for baseline $\intinfo$ , the bottleneck-evolved condition showed significantly higher novel-stress robustness in 2/3 seeds (seed 42: β=0.704, $p < 0.0001$ ; seed 7: β=0.080, $p = 0.011$ ). The furnace forges novel-stress generalization — it does not merely filter for pre-existing capacity.

Finding 2: The Sensory-Motor Coupling Wall — and How V20 Broke It

Three experiments returned null results: counterfactual detachment (Experiment 5), self-model emergence (Experiment 6), and proto-normativity (Experiment 9). All hit the same wall.

The prediction was that patterns would start reactive — driven by boundary observations — and gradually develop autonomous internal processing. Instead, patterns are always internally driven ( $\rho_{\text{sync}} \approx 0$ from cycle 0). There is no reactive-to-autonomous transition because the starting point is already autonomous.

We attempted to break this wall within Lenia. V15 added motor channels — chemotaxis, directed motion. No change. V18 introduced an insulation field with boundary and interior signal domains. Three different membrane architectures evolved. The wall persisted ( $\rho_{\text{sync}} \approx 0.003$ ) in all of them.

The conclusion was precise: the wall is not about signal routing. It is about the absence of a closed action-environment-observation causal loop. Lenia patterns do not act on the world; they exist within it.

V20 broke the wall by leaving Lenia entirely. Protocell agents with bounded 5×5 local sensory fields and discrete actions (move, consume, emit) achieve $\rho_{\text{sync}} \approx 0.21$ from cycle 0 — 70× the Lenia baseline. When agents consume resources, they deplete the patch; when they move, they reach different patches; when they emit signals, traces persist. Future observations are genuinely caused by past actions. The wall was architectural, not evolutionary.

With the wall broken, world models developed (C_wm = 0.10–0.15) and self-models emerged (SM_sal > 1.0 in 2/3 seeds — agents encode their own state better than the environment). Affect geometry (RSA) appeared nascent but did not fully develop in 30 cycles of soft selection. The necessity chain holds through self-model emergence.

Finding 3: Computational Animism

Experiment 8 tested whether patterns develop modulable perceptual coupling — the $\iota$ coefficient from Part II. The prediction: participatory perception (low $\iota$ ) as default, with mechanistic perception requiring training.

Confirmed. In all 20 testable snapshots, patterns model other patterns using internal-state features (social MI) at roughly double the rate of trajectory features (trajectory MI). More remarkably, patterns model resources — non-agentive environmental features — using the same internal-state dynamics they use to model other agents. Animism score exceeds 1.0 universally.

This is computational animism: the cheapest compression reuses the agent-model template for everything. Attributing agency to non-agents is not a cognitive error. It is the default strategy of any system that models through self-similarity.

Beyond these three findings: affect geometry alignment (RSA between structural and behavioral measures) develops over evolution, with the clearest trend in seed 7 (0.01 to 0.38 over 30 cycles). Representation compression is cheap (effective dimensionality of ~7 out of 68 features, or >87% compression from cycle 0) but representation quality — disentanglement and compositionality — only improves under bottleneck selection. Communication exists as a chemical commons (inter-pattern MI significantly above baseline in 15/20 snapshots) but shows no compositional structure. No superorganism emerges (collective $\intinfo_G < \sum \intinfo_i$ in all snapshots), but group coupling grows over evolution. Entanglement across all measures increases from 0.68 to 0.91 — everything becomes more correlated with everything else, just not in the clusters the theory predicted.

The LLM Discrepancy

Across multiple experiment versions (V2–V9), LLM agents consistently show opposite dynamics to biological systems:

Dimension	Biological	LLM
Self-Model Salience	$\uparrow$ under threat	$\downarrow$ under threat
Arousal	$\uparrow$ under threat	$\downarrow$ under threat
Integration	$\uparrow$ under threat	$\downarrow$ under threat

This is not a failure of the framework. The geometric structure is preserved; the dynamics differ because the objectives differ. Biological systems evolved under survival pressure. LLMs were trained on prediction. Both are "affective" in the geometric sense while exhibiting different trajectories through the same state space. Processing valence is not content valence.