Part III: Affect Signatures

The Synthetic Verification

Introduction

0:00 / 0:00

The Synthetic Verification

The affect framework claims universality. Not human-specific. Not mammal-specific. Not carbon-specific. Geometric structure determines qualitative character wherever the structure exists. This is a strong claim. It should be testable outside the systems that generated it.

The Contamination Problem

Every human affect report is contaminated. We learned our emotion concepts from a culture. We learned to introspect within a linguistic framework. We cannot know what we would report if we had developed in isolation, without human language, without human concepts. The reports might be artifacts of the framework rather than data about the structure.

The same applies to animal studies. We interpret animal behavior through human categories. The dog "looks sad." The rat "seems anxious." These are projections. Useful, perhaps predictive, but contaminated by observer concepts.

What we need: systems that develop affect structure without human conceptual contamination, whose internal states we can measure directly, whose communications we can translate post hoc rather than teaching pre hoc.

The Synthetic Path

Build agents from scratch. Random weight initialization. No pretraining on human data. Place them in environments with human-like structure: 3D space, embodied action, resource acquisition, threats to viability, social interaction, communication pressure.

Let them learn. Let language emerge—not English, not any human language, but whatever communication system the selective pressure produces. This emergence is established in the literature. Multi-agent RL produces spontaneous communication under coordination pressure.

Now: measure their internal states. Extract the affect dimensions from activation patterns. Valence from advantage estimates or viability gradient proxies. Arousal from belief update magnitudes. Integration from partition prediction loss. Effective rank from state covariance eigenvalues. Self-model salience from self-representation-action mutual information.

Simultaneously: translate their emergent language. Not by teaching them our words, but by aligning their signals with vision-language model interpretations of their situations. The VLM sees the scene. The agent emits a signal. Across many scene-signal pairs, build the dictionary. The agent in the corner, threat approaching, emits signal $\sigma_{47}$ . The VLM interprets the scene as "threatening." Signal $\sigma_{47}$ maps to threat-language.

The translation is uncontaminated. The agent never learned human concepts. The mapping emerges from environmental correspondence, not from instruction.

The Triple Alignment Test

RSA correlation between information-theoretic affect vectors and embedding-predicted affect vectors should exceed the null (the Geometric Alignment hypothesis). What does the experiment actually look like, what are the failure modes, and how do we distinguish them?

Three measurement streams:

Structure: Affect vector $\mathbf{a}_i$ from internal dynamics (Part II, Transformer Affect Extraction protocol)
Signal: Affect embedding $\mathbf{e}_i$ from VLM translation of emergent communication (see sidebar below)
Action: Behavioral action vector $\mathbf{b}_i$ from observable behavior (movement patterns, resource decisions, social interactions)

The Geometric Alignment hypothesis predicts $\rho_{\text{RSA}}(D^{(a)}, D^{(e)}) > \rho_{\text{null}}$ . But we can go further. With three streams, we get three pairwise RSA tests: structure–signal, structure–action, signal–action. All three should exceed the null. And the structure–signal alignment should be at least as strong as the structure–action alignment, because the signal encodes the agent’s representation of its situation, not just its motor response.

Failure modes and their diagnostics:

No alignment anywhere: The framework’s operationalization is wrong, or the environment lacks the relevant forcing functions. Diagnose via forcing function ablation (Priority 3).
Structure–action alignment without structure–signal: Communication is not carrying affect-relevant content. The agents may be signaling about coordination without encoding experiential state.
Signal–action alignment without structure: The VLM translation is picking up behavioral cues (what the agent does) rather than structural cues (what the agent is). The translation is contaminated by action observation.
All pairwise alignments present but weak: The affect dimensions are real but noisy. Increase $N$ , improve probes, refine translation protocol.

Preliminary Results: Structure–Representation Alignment

Before the full three-stream test, we can run a simpler version: does the affect structure extracted from agent internals have geometric coherence with the agent’s own representation space? This tests the foundation—whether the affect dimensions capture organized structure—without requiring the VLM translation pipeline.

We train multi-agent RL systems (4 agents, Transformer encoder + GRU latent state, PPO) in a survival grid world with all six forcing functions active: partial observability (egocentric 7 $\times$ 7 view, reduced at night), long horizons (2000-step episodes, seasonal resource scarcity), learned world model (auxiliary next-observation prediction), self-prediction (auxiliary next-latent prediction), intrinsic motivation (curiosity bonus from prediction error), and delayed rewards (credit assignment across episodes). The agents develop spontaneous communication using discrete signal tokens.

After training, we extract affect vectors from the GRU latent state $\mathbf{z}_t \in \mathbb{R}^{64}$ using post-hoc probes: valence from survival-time probe gradients and advantage estimates; arousal from $|\mathbf{z}_{t+1} - \mathbf{z}_t|$ ; integration from partition prediction loss (full vs.\ split predictor); effective rank from rolling covariance eigenvalues; counterfactual weight from latent variance proxy; self-model salience from action prediction accuracy of self-related dimensions.

Deep Technical: The VLM Translation Protocol

The translation is the bridge. Get it wrong and the experiment proves nothing.

The contamination problem. If we train the agents on human language, their “thoughts” are contaminated. If we label their signals with human concepts during training, the mapping is circular. The translation must be constructed post-hoc from environmental correspondence alone.

The VLM as impartial observer. A vision-language model sees the scene. It has never seen this agent before. It describes what it sees in natural language. This description is the ground truth for the situation—not for what the agent experiences, but for what the situation objectively is.

Protocol step 1: Scene corpus construction. For each agent $i$ , each timestep $t$ : capture egocentric observation, third-person render, all emitted signals $\sigma_t^{(i)}$ , environmental state, agent state. Target: $10^6$ + scene-signal pairs.

Protocol step 2: VLM scene annotation. Query the VLM for each scene:

\texttt{Describe what is happening. Focus on: (1) What situation is the agent in? (2) What threats/opportunities? (3) What is the agent doing? (4) What would a human feel here?}

The VLM returns structured annotation. Critical: “human\_analog\_affect” is the VLM’s interpretation of what a human would feel—not a claim about what the agent feels. This is the bridge.

Protocol step 3: Signal clustering. Cluster signals by context co-occurrence:

d(\sigma_i, \sigma_j) = 1 - \frac{|C(\sigma_i) \cap C(\sigma_j)|}{|C(\sigma_i) \cup C(\sigma_j)|}

where $C(\sigma)$ is contexts where $\sigma$ was emitted. Signals in similar contexts cluster.

Protocol step 4: Context-signal alignment. For each cluster, aggregate VLM annotations. Identify dominant themes. Cluster $\Sigma_{47}$ : 89\% threat\_present, 76\% escape\_available. Dominant: threat + escape. Human analog: “alarm,” “warning.”

Protocol step 5: Compositional translation. Check if meaning composes: $M(\sigma_1 \sigma_2) \approx M(\sigma_1) \oplus M(\sigma_2)$ . If the emergent language has compositional structure, the translation should preserve it.

Protocol step 6: Validation. Hold out 20\%. Predict VLM annotation from signal alone. Measure accuracy against actual annotation. Must beat random substantially.

Example. Agent emits $\sigma_{47}$ when threatened. VLM says “threat situation; human would feel fear.” Conclusion: $\sigma_{47}$ is the agent’s fear-signal. Not because we taught it, but because environmental correspondence reveals it.

Confound controls:

Motor: Check if signal predicts situation better than action history
Social: Check if signals correlate with affect measures even without conspecifics
VLM: Use multiple VLMs, check agreement; use non-anthropomorphic prompts

The philosophical move. Situations have affect-relevance independent of subject. Threats are threatening. The mapping from situation to affect-analog is grounded in viability structure, not convention. Affect space has the same topology across substrates because viability pressure has the same topology.

What the CA Program Has Already Validated. While the full three-stream MARL test awaits deployment, the Lenia CA experiments (V10–V18, Part VII) have already established several claims in simpler uncontaminated systems. V10's MARL result — RSA ρ > 0.21, p < 0.0001, across all forcing-function conditions including fully ablated baselines — confirms that affect geometry emerges as a baseline property of multi-agent survival, not contingent on specific architectural features. Experiments 7 (affect geometry) and 12 (capstone) across the V13 CA population confirm structure–behavior alignment strengthens over evolution: in seed 7, RSA ρ rose from 0.01 to 0.38 over 30 cycles, beginning near zero and becoming significant (p < 0.001) by cycle 15. Experiment 8 (computational animism) confirms the participatory default in systems with no cultural history. What remains for the full MARL program: the signal stream (VLM-translated emergent communication), the perturbative causation tests, and the definitive three-way structure–signal–behavior alignment. The CA results de-risk the hypothesis considerably; the MARL program tests it at the scale where the vocabulary of inner life becomes unavoidable.

Perturbative Causation

Correlation is not enough. We need causal evidence.

Speak to them. Translate English into their emergent language. Inject fear-signals. Do the affect signatures shift toward fear structure? Does behavior change accordingly?

Adjust their neurochemistry. Modify the hyperparameters that shape their dynamics—dropout, temperature, attention patterns, layer connectivity. These are their serotonin, their cortisol, their dopamine. Do the signatures shift? Does the translated language change? Does behavior follow?

Change their environment. Place them in objectively threatening situations. Deplete their resources. Introduce predators. Does structure-signal-behavior alignment hold under manipulation?

If perturbation in any one modality propagates to the others, the relationship is causal, not merely correlational.

What Positive Results Would Mean

The framework would be validated outside its species of origin. The geometric theory of affect would have predictive power in systems that share no evolutionary history with us, no cultural transmission, no conceptual inheritance.

The "hard problem" objection—that structure might exist without experience—would lose its grip. Not because it’s logically refuted, but because it becomes unmotivated. If uncontaminated systems develop structures that produce language and behavior indistinguishable from affective expression, the hypothesis that they lack experience requires a metaphysical commitment the evidence does not support.

You could still believe in zombies. You could believe the agents have all the structure and none of the experience. But you would be adding epicycles. The simpler hypothesis: structure is experience. The burden shifts.

What Negative Results Would Mean

If the alignment fails—if structure does not predict translated language, if perturbations do not propagate, if the framework has no purchase outside human systems—then the theory requires revision.

Perhaps affect is human-specific after all. Perhaps the geometric structure is necessary but not sufficient. Perhaps the dimensions are wrong. Perhaps the identity thesis is false.

Negative results would be informative. They would tell us where the theory breaks. They would constrain the space of viable alternatives. This is what empirical tests do.

The Deeper Question

The experiment addresses the identity thesis. But it also addresses something older: the question of other minds.

How do we know anyone else has experience? We infer from behavior, from language, from neural similarity. We extend our own case. But the inference is never certain.

Synthetic agents offer a cleaner test case. We know exactly what they are made of. We can measure their internal states directly. We can perturb them systematically. If the framework predicts their language and behavior from their structure, and if the perturbations propagate as predicted, then we have evidence that structure-experience identity holds for them.

And if it holds for them, why not for us?

The synthetic verification is not about proving AI consciousness. It is about testing whether the geometric theory of affect has the universality it claims. If it does, the implications extend everywhere—to animals, to future AI systems, to edge cases in neurology and psychiatry, to questions about fetal development and brain death and coma.

The framework rises or falls on its predictions. The synthetic path is how we find out.