Part I: Foundations

The Self-Effect Regime

Introduction

0:00 / 0:00

The Self-Effect Regime

As a controller becomes more capable, it increasingly shapes its own environment. The observations it receives are increasingly consequences of its own actions.

The self-effect ratio quantifies this shift. For a system with policy $\policy$ in environment $\mathcal{E}$ :

\rho_t = \frac{\MI(\mathbf{a}_{1:t}; \mathbf{o}_{t+1} | \mathbf{x}_0)}{\entropy(\mathbf{o}_{t+1} | \mathbf{x}_0)}

where $\MI$ denotes mutual information and $\entropy$ denotes entropy. This measures what fraction of the information in future observations is attributable to past actions. For capable agents in structured environments, $\rho_t$ increases with agent capability, and in the limit:

\lim_{\text{capability} \to \infty} \rho_t \to 1

(bounded by the environment’s intrinsic stochasticity).

Passenger or Cause?

There is a simple way to think about $\rho$ . Imagine forking a system at time $t$ : same starting state, but one copy takes its normal actions while the other takes completely random ones. After $k$ steps, how different are their observations?

If $\rho \approx 0$ : nearly identical observations. The system is a passenger — its actions don’t change what happens to it. Its future is determined by the environment, not by what it does.

If $\rho > 0$ : observations diverge. The system is a cause — what it does changes what it subsequently perceives. Its future is partly authored by itself.

This distinction turns out to be architecturally fundamental. We measured it directly in two substrates:

Lenia (V13–V18): $\rho_{\text{sync}} \approx 0.003$ . Patterns that evolved complex internal dynamics, memory channels, insulation fields, and directed motion — all read as passengers. Their "actions" (chemotaxis, emission) are biases on a continuous fluid governed by FFT dynamics that integrate over the full grid. Whatever a pattern does is immediately folded back into the global field. The fork barely diverges.
Protocell agents (V20): $\rho_{\text{sync}} \approx 0.21$ from initialization. When an agent consumes resources at a location, that patch is depleted — its future observations there are different. When it moves, it reaches different patches. When it emits a signal, a chemical trace persists. The fork diverges because actions have consequences that return as observations.

The gap — 0.003 versus 0.21 — is not about intelligence or evolutionary history. It appeared in V20 at cycle 0, before any selection pressure. It is purely architectural: does the substrate provide a loop where actions change the world and the changed world is what the agent observes next? Lenia doesn’t. Protocell agents do.

Why does this matter for self-modeling? Because a system cannot model itself as a cause if it isn’t one. The self-model pressure — the prediction advantage described in the next section — only activates when $\rho > \rho_c$ . Below that threshold, there is nothing to model: the self is not a significant latent variable in one’s own observations.