Part VII: Empirical Program
Summary of Part VII
Summary of Part VII
The empirical program has four layers:
- What was confirmed: Affect geometry is a baseline property of multi-agent survival (V10). Content-based coupling under lethal selection produces biological-like integration at population bottlenecks (V13). Temporal memory is the one substrate extension evolution consistently selects for (V15). Boundary-dependent dynamics produce the highest robustness of any substrate (V18, mean 0.969). Population bottlenecks actively create novel-stress generalization — the furnace forges, not merely filters (V19, CREATION confirmed in 2/3 seeds; V31 confirmed at 10-seed scale, , ). The sensory-motor coupling wall is broken by genuine agency: V20 protocell agents achieve (70× Lenia). World models develop and self-models emerge in evolved neural agents with no human data contamination. Computational animism is the default perceptual mode (Experiment 8). Affect geometry develops over evolution (Experiment 7). V20b adds drought bottlenecks (82–99% mortality) and confirms the furnace: max robustness 1.532 at pop=33. Language precursor test (z-gate polarization): NULL — agents evolved always-mixed rather than oscillating modes, indicating that imagination-mode emergence requires richer pressure than survival alone. V21 adds 8 inner processing ticks per step (CTM architecture): agents use all ticks (no collapse), but evolution alone is too slow to create adaptive deliberation — architecture works, optimization doesn't. V22 provides within-lifetime gradient signal: agents learn to predict their own energy delta with 100–15,000× improvement per lifetime. Evolution does not suppress learning (3/3 seeds). But prediction accuracy is orthogonal to integration under stress — robustness does not improve. V23 extends to multi-target prediction: weight columns specialize (cosine ≈ 0), but Phi decreases (0.079 vs 0.097). Specialization is integration's enemy. V24 adds TD value learning: survival improves (robustness 1.012) but integration remains seed-dependent. The prediction→integration pathway is now fully mapped: accuracy (V22), breadth (V23), and time horizon (V24) are all insufficient. The bottleneck is architectural. V27 breaks through: a two-layer MLP prediction head creates gradient coupling that forces cross-component computation, yielding (2.5× baseline, highest in any protocell experiment). V28 confirms the mechanism is gradient coupling through composition, not nonlinearity or bottleneck width. V29–V31 (13 seeds) show prediction target has no significant effect (): what matters is coupling architecture and evolutionary trajectory. V33 confirms this from the other direction: contrastive self-prediction (predicting how outcomes differ between actual and counterfactual actions) destabilizes gradient learning rather than forcing integration — prediction MSE increases 1.5–18.7× over evolution, late drops to 0.054 (0% HIGH across 10 seeds). The path to rung-8 counterfactual representation is not through loss function engineering. V34 tests whether integration can be selected for directly (fitness = survival × (1 + 2Φ)): the answer is no — 20% HIGH (within noise of baseline), with 2/10 seeds Goodharting (Φ-robustness correlation < -0.3). Integration is not a directly selectable trait; it arises as a byproduct of getting architecture and trajectory right. V35 shows referential communication emerges in 100% of seeds under cooperative POMDP pressure but does not lift integration ( correlation ): language is cheap, like geometry, and substitutes for internal complexity. VLM convergence test: vision-language models trained on human affect data independently recognize affect geometry in uncontaminated protocell agents (RSA , ), with convergence increasing when narrative framing is removed — ruling out projection and confirming structural universality.
- What the results mean: The emergence ladder has ten rungs, from geometric inevitability to moral reasoning. Seven are accessible in Lenia substrates. Rung 8 (counterfactual sensitivity) requires embodied agency — V20 crosses it. Self-models follow (rung 9, 2/3 seeds). Affect geometry (rung 7 in full form) requires bottleneck selection regardless of substrate. The ladder is not a conjecture — it is an experimental finding.
- The bridge to psychology: The first seven rungs map onto pre-reflective experience — mood, arousal, habituation, animistic perception, emotional coherence, temporal depth, and resilience under stress. Rungs 8–9 map onto reflective cognition — imagination and self-awareness. Normativity (rung 10) is the only rung not yet within reach. These predictions are testable with existing neuroimaging and behavioral methods.
- What remains: Bridge to human neuroscience (Priority 1). V33 (contrastive self-prediction) tested whether counterfactual objectives force integration — the answer is no; contrastive loss destabilizes gradient learning (late = 0.054, 0% HIGH, all predictions falsified). V34 (-inclusive fitness) tested whether direct selection can push integration beyond the 22% baseline — the answer is also no; 20% HIGH (within noise), with 2/10 seeds Goodharting. Integration cannot be selected for directly; it must emerge from architectural coupling and biographical forging. Scale social integration toward superorganism detection (Priority 4). Track AI affect across training regimes (Priority 5).
The theory is falsifiable. The experiments are specified. Thirty-four versions have been run across seven substrate types and twelve measurement experiments. Two architectural walls have been identified and broken — the sensory-motor wall (V20) and the decomposability wall (V27). The geometry has been confirmed as universal via independent VLM convergence. The question is not whether the framework is beautiful but whether it is true — and the answer so far is: partially, with caveats, a precise understanding of where it breaks, and clear instructions about where to look next.