Experiments

V24: TD Value Learning

V24: TD Value Learning

Period: 2026-02-19. Substrate: V22 + temporal difference value function (semi-gradient TD).

Hypothesis: Long-horizon prediction via value function V(s)=E[γtrt]V(s) = \mathbb{E}[\sum \gamma^t r_t] integrates over all possible futures — inherently non-decomposable.

MetricSeed 42Seed 123Seed 7Mean
Mean robustness1.0340.9981.0031.012
Mean Φ\intinfo0.0510.0720.1300.084
Final γ\gamma0.7480.7460.7410.745

Finding: Best robustness of any prediction experiment (1.012). Agents evolve moderate discount (γ0.75\gamma \approx 0.75, horizon ~ 4 steps). But Φ\intinfo is seed-dependent. The bottleneck is architectural: a single linear value readout doesn't force non-decomposable structure.

V24 trajectories: robustness, integration, population, and TD error
V24 evolution trajectories. Top-left: robustness with dramatic spikes at drought boundaries (up to 1.5) — the highest transient robustness of any prediction experiment. Top-right: Φ shows the widest variance (0.02–0.25), with seed 7 reaching high values mid-evolution before declining. Bottom-left: population dynamics. Bottom-right: TD error decreases over evolution — value learning works, but doesn't force integration because the linear readout is decomposable.
Robustness and integration compared across V22, V23, V24
V22–V24 prediction experiment comparison. Left: mean robustness. V24 (TD value) achieves the highest (~1.01), crossing the 1.0 threshold. V22 and V23 cluster below 1.0. Right: mean Φ. All three experiments overlap in the 0.06–0.10 range with high per-seed variance. Individual seed dots show no experiment consistently outperforms the others. The prediction target (scalar energy, multi-target, temporal value) does not reliably change integration — only architecture does (see V27).

Source code