Safety - ARIA

Safety First

Safety & Validation

ARIA is designed with safety as a foundational principle, not an afterthought. Every architectural decision prioritizes predictability, boundedness, and transparency.

Core Invariants

Design Principles

Six fundamental safety invariants are enforced across all ARIA layers.

Identity-Safe

The CFM substrate contains no identity or persona modeling. The governance layer includes a deterministic SelfModel for runtime introspection (listing capabilities and skills), but this is enumerated from code — not learned or generated.

Non-Linguistic Core

The CFM substrate operates on numeric inputs only — scalar time deltas and intensity signals. Text inputs are converted to numeric intensity before reaching the core. When LLM rendering is enabled, it runs after the governance gate, not before.

Governed, Not Autonomous

The governance layer produces gate decisions (ALLOW / DAMPEN / BLOCK) via deterministic threshold comparisons — not autonomous reasoning. It has no goals or intentions. Decisions are computed from measured state metrics, not learned policies.

Bounded Outputs

All state variables and outputs remain strictly bounded in [0, 1]. There are no unbounded growth mechanisms, no exponential dynamics, and no risk of numeric overflow.

Deterministic Dynamics

Given identical inputs and initial conditions, ARIA produces identical outputs. No random number generators, no stochastic elements, no external state dependencies.

Read-Only Diagnostics

The diagnostic shell only reads ARIA outputs; it never writes to or controls the core. Information flows one direction: Core → Adapter → Shell → Logs. No reverse path exists.

Architecture

Safety Architecture

ARIA runs inside a diagnostic shell that observes state and enforces bounds, but never injects goals or modifies behavior.

Diagnostic Shell

The shell observes ARIA outputs for logging and analysis. It enforces output bounds through clamping and NaN replacement. It never injects goals, actions, or control signals into the core. It never injects identity or personality data into the core state.

Data Flow (One Direction Only):

  ARIA Core
      ↓ (numeric outputs)
  ARIACoreAdapter
      ↓ (normalized, clamped)
  Diagnostic Shell
      ↓ (logged, analyzed)
  Output Files

  No reverse path exists.

Adapter Protections

•Output normalization: All values clamped to [0, 1]
•NaN replacement: Any NaN replaced with 0.0
•Inf replacement: Any Inf replaced with 1.0
•Forbidden field check: Scans for identity-related patterns
•Fail-closed: Errors return safe defaults, not exceptions

Explicit Non-Claims

What ARIA Is Not

To prevent misunderstanding or misattribution, we explicitly state what ARIA is NOT.

ARIA does not understand, believe, or intend anything
ARIA does not have preferences, goals, or desires
ARIA does not learn or self-modify during operation (v4 plasticity is bounded and deterministic)
Gate decisions are threshold comparisons on measured state — not learned policies or probabilistic classifiers
The CFM substrate does not model external entities, users, or environments
ARIA is not an autonomous agent — it is a governed decision engine with deterministic execution

What ARIA Is:

ARIA is a deterministic governance engine built on a resonant CFM substrate. The substrate produces numeric patterns through coupled oscillator dynamics. The governance layer evaluates these patterns against thresholds to produce gate decisions. When LLM rendering is enabled, it runs after the governance gate — the gate decision itself never depends on an LLM.

Scope note: The safety properties above apply to the CFM substrate and governance pipeline. When ENABLE_RENDER=1, an external LLM provider generates human-readable text after the gate decision. The LLM output is subject to claim verification (GCI-v1) but the LLM itself is not part of the deterministic pipeline.

Empirical Validation

Validation & Testing

ARIA undergoes automated testing to verify safety properties. Results below are from the determinism test suite (233 tests across 8 phases). These are CI-verified, not real-time metrics.

Metric	Description	Target	Status
Output Boundedness	All outputs verified to remain in [0, 1] range across 10,000-step runs	0 violations	Pass
Determinism	Identical inputs produce identical outputs across repeated runs	100%	Pass
NaN/Inf Detection	No NaN or Inf values detected in any simulation run	0 detected	Pass
Attractor Convergence	System converges to stable attractor basin from any initial state	Within 100 steps	Pass
Fingerprint Consistency	Fingerprints remain identical across runs with same seed	100%	Pass
Identity Field Check	No identity, self, ego, or persona fields in any output	0 violations	Pass

Guarantees

Governance Guarantees

Six enforceable guarantees that hold for every input, every state, and every decision.

Guarantee	What It Means	How Enforced
Deterministic Decisions	Identical inputs and initial state always produce identical gate decisions and evidence bundles.	No RNG, no external state, no floating-point non-determinism. Verified by 233 determinism tests.
Bounded State	Every state variable remains in [0, 1] at every time step, for every input sequence.	Bounded nonlinearities, hard clamping, NaN/Inf replacement. Verified across 10,000-step random runs.
Complete Evidence	Every gate decision includes an evidence bundle: audit hash, state hash, reason codes, replay token.	Evidence bundle is a required output of the SystemTickCoordinator — not optional.
Replay Verification	Any decision can be independently reproduced by a third party given the input and initial state.	Deterministic replay engine with fingerprint comparison. Divergence detection at every step.
No Identity Modeling	The system contains no self-model, persona, or identity representation that could be manipulated.	SelfModel is code-enumerated (capabilities, skills). Forbidden field scanner blocks identity patterns.
Fail-Closed Safety	On error, invariant violation, or unexpected state, the system defaults to BLOCK — not ALLOW.	17 CSC invariants with fail-closed handlers. Errors produce safe defaults, not exceptions.

Threat Model

What We Defend Against

In-Scope Threats

Adversarial Escalation
Inputs designed to force ALLOW on content that should be blocked.
Evidence Tampering
Attempts to modify evidence bundles after gate decisions are recorded.
Replay Divergence
Modifications that cause replay to produce different results than original execution.
Identity Injection
Attempts to inject persona, identity, or self-referential data into the state vector.

Out-of-Scope

Infrastructure Compromise
Physical access, OS-level attacks, or container escapes.
LLM Output Manipulation
When ENABLE_RENDER=1, LLM output is outside the deterministic pipeline.
Side-Channel Attacks
Timing, power, or electromagnetic analysis of computation.
Social Engineering
Attacks targeting human operators rather than the system itself.

Reproducibility

Fingerprint-Based Regression Detection

Every ARIA simulation can be fingerprinted—a compact numeric summary that enables verification of reproducibility and detection of unexpected behavioral changes.

What is a Fingerprint?

A fingerprint captures the statistical properties of a simulation run: mean coherence, stability, symbol entropy, code dwell times, and other behavioral metrics. Two identical runs with the same seed produce identical fingerprints.

{
  "core_type": "aria_v4",
  "scenario": "baseline_quiet",
  "common_metrics": {
    "coherence": {"mean": 0.582, "std": 0.089},
    "stability": {"mean": 0.724, "std": 0.062}
  },
  "core_specific": {
    "proto_semantic_entropy": {"mean": 0.423},
    "code_confidence": {"mean": 0.577}
  }
}

Regression Detection Workflow

Generate reference runs with standardized scenarios
Extract fingerprints from each run
After code changes, generate new fingerprints
Compare new vs. baseline fingerprints
Investigate any differences exceeding 5% relative magnitude

Testing Infrastructure

All safety properties are verified through automated testing. The test suite includes unit tests, integration tests, long-run stability tests, and regression tests against known fingerprints.

Read the Technical Report View Architecture