Experiments

VLM Convergence Experiment

VLM Convergence Experiment

Cross-Substrate ConvergenceVLMs trained on human data recognize affect in protocellsHuman Dataimages, text, faces,emotion labels↓ trainProtocell Simgrid world snapshots,affect metrics, no labels↓ measureGPT-4oClaude 3.5Gemini 1.5Affect SpaceV, A, Φ, r_eff, CF, SM(uncontaminated)RSA ρ = 0.72RSA ρ = 0.54RSA ρ = 0.78Pre-Registered Predictions: 4/4 PASSDrought → desperation/anxietyRecovery → relief/optimismStable → contentmentNarrative removal → correlation holds

Status: Complete. Both models tested.

Core question: If affect geometry is universal, do systems trained on human affect data (GPT-4o, Claude) independently recognize the same affect signatures in completely uncontaminated substrates?

Method: 48 behavioral vignettes extracted from / protocell data across 6 conditions (normal foraging, pre-drought abundance, drought onset, drought survival, post-drought recovery, late-stage evolution). Presented to VLMs with purely behavioral descriptions — no affect language, no framework terms, explicitly labeled as artificial systems. Framework predictions computed independently. Convergence measured via Representational Similarity Analysis (RSA) between framework-predicted and -labeled affect spaces.

Result: STRONG CONVERGENCE. GPT-4o: RSA ρ=0.72\rho = 0.72 (p<0.0001p < 0.0001). Claude Sonnet: ρ=0.54\rho = 0.54 (p<0.0001p < 0.0001). All four pre-registered predictions pass on both models:

  • P1: VLMs label drought onset as fear/anxiety — PASS (both: desperation, anxiety, urgency, 8/8 unanimous)
  • P2: VLMs label post-drought recovery as relief/hope — PASS (both: relief, cautious optimism)
  • P3: VLMs distinguish HIGH vs LOW late-stage — see condition summary
  • P4: RSA between framework and affect spaces > 0.3 — PASS (0.72 and 0.54)

Robustness check: raw numbers only. Re-ran with purely numerical descriptions (no narrative framing — just measured quantities like removal_fraction: 0.9800). Convergence increases: GPT-4o ρ=0.78\rho = 0.78, Claude ρ=0.72\rho = 0.72. This rules out narrative pattern-matching. The VLMs recognize geometric structure from raw numerical patterns — population dynamics and state update rates are sufficient.

Robustness check: basis-independence. Standard RSA standardizes each affect axis before correlating, which makes it depend on the chosen coordinate basis — so a critic can ask whether the convergence merely reflects six control-theoretic coordinates any viable controller exhibits. Re-measured with basis-independent tools (): RSA on rotation-invariant raw-Euclidean dissimilarity matrices (exactly invariant to rotations/reflections of either space, permutation null) and Gromov-Wasserstein distance. The convergence does not weaken — it strengthens. Rotation-invariant RSA rises to ρ=0.892\rho = 0.892 for GPT-4o and ρ=0.810\rho = 0.810 for Claude (both p<0.001p < 0.001, 2000-permutation null), and is unchanged to machine precision (Δ1017|\Delta| \sim 10^{-17}) when the affect space is randomly rotated; GW distance is likewise rotation-invariant. The cross-substrate alignment is a property of the relational structure, not of the coordinate choice. (The same tool applied to the much weaker within-substrate affect-to-behavior alignment of / shows the opposite: its modest standard-RSA ρ0.07\rho \approx 0.07 collapses to 0\approx 0 basis-independently — that alignment was a basis artifact, and nothing load-bearing rests on it.)

Theoretical significance: Two VLMs, trained independently on human data, with no exposure to the framework, produce affect labels that match framework geometric predictions for a system that has never encountered human affect concepts. The convergence happens because both are tapping the same underlying structure: affect geometry arises from the physics of viable self-maintenance, and human language about emotions encodes the same geometry the protocells produce.

Source code

Study record — canonical metadata, result path, status, seeds, and key finding.

  • Full pipeline: vignette extraction, VLM prompting, RSA analysis
  • Pre-registered experiment design