Safety First
Safety & Validation
ARIA is designed with safety as a foundational principle, not an afterthought. Every architectural decision prioritizes predictability, boundedness, and transparency.
Core Invariants
Design Principles
Six fundamental safety invariants are enforced across all ARIA layers.
Identity-Safe
The CFM substrate contains no identity or persona modeling. The governance layer includes a deterministic SelfModel for runtime introspection (listing capabilities and skills), but this is enumerated from code — not learned or generated.
Non-Linguistic Core
The CFM substrate operates on numeric inputs only — scalar time deltas and intensity signals. Text inputs are converted to numeric intensity before reaching the core. When LLM rendering is enabled, it runs after the governance gate, not before.
Governed, Not Autonomous
The governance layer produces gate decisions (ALLOW / DAMPEN / BLOCK) via deterministic threshold comparisons — not autonomous reasoning. It has no goals or intentions. Decisions are computed from measured state metrics, not learned policies.
Bounded Outputs
All state variables and outputs remain strictly bounded in [0, 1]. There are no unbounded growth mechanisms, no exponential dynamics, and no risk of numeric overflow.
Deterministic Dynamics
Given identical inputs and initial conditions, ARIA produces identical outputs. No random number generators, no stochastic elements, no external state dependencies.
Read-Only Diagnostics
The diagnostic shell only reads ARIA outputs; it never writes to or controls the core. Information flows one direction: Core → Adapter → Shell → Logs. No reverse path exists.
Architecture
Safety Architecture
ARIA runs inside a diagnostic shell that observes state and enforces bounds, but never injects goals or modifies behavior.
Diagnostic Shell
The shell observes ARIA outputs for logging and analysis. It enforces output bounds through clamping and NaN replacement. It never injects goals, actions, or control signals into the core. It never injects identity or personality data into the core state.
Data Flow (One Direction Only):
ARIA Core
↓ (numeric outputs)
ARIACoreAdapter
↓ (normalized, clamped)
Diagnostic Shell
↓ (logged, analyzed)
Output Files
No reverse path exists.Adapter Protections
- •Output normalization: All values clamped to [0, 1]
- •NaN replacement: Any NaN replaced with 0.0
- •Inf replacement: Any Inf replaced with 1.0
- •Forbidden field check: Scans for identity-related patterns
- •Fail-closed: Errors return safe defaults, not exceptions
Explicit Non-Claims
What ARIA Is Not
To prevent misunderstanding or misattribution, we explicitly state what ARIA is NOT.
- ARIA does not understand, believe, or intend anything
- ARIA does not have preferences, goals, or desires
- ARIA does not learn or self-modify during operation (v4 plasticity is bounded and deterministic)
- Gate decisions are threshold comparisons on measured state — not learned policies or probabilistic classifiers
- The CFM substrate does not model external entities, users, or environments
- ARIA is not an autonomous agent — it is a governed decision engine with deterministic execution
What ARIA Is:
ARIA is a deterministic governance engine built on a resonant CFM substrate. The substrate produces numeric patterns through coupled oscillator dynamics. The governance layer evaluates these patterns against thresholds to produce gate decisions. When LLM rendering is enabled, it runs after the governance gate — the gate decision itself never depends on an LLM.
Scope note: The safety properties above apply to the CFM substrate and governance pipeline. When ENABLE_RENDER=1, an external LLM provider generates human-readable text after the gate decision. The LLM output is subject to claim verification (GCI-v1) but the LLM itself is not part of the deterministic pipeline.
Empirical Validation
Validation & Testing
ARIA undergoes automated testing to verify safety properties. Results below are from the determinism test suite (233 tests across 8 phases). These are CI-verified, not real-time metrics.
| Metric | Description | Target | Status |
|---|---|---|---|
| Output Boundedness | All outputs verified to remain in [0, 1] range across 10,000-step runs | 0 violations | Pass |
| Determinism | Identical inputs produce identical outputs across repeated runs | 100% | Pass |
| NaN/Inf Detection | No NaN or Inf values detected in any simulation run | 0 detected | Pass |
| Attractor Convergence | System converges to stable attractor basin from any initial state | Within 100 steps | Pass |
| Fingerprint Consistency | Fingerprints remain identical across runs with same seed | 100% | Pass |
| Identity Field Check | No identity, self, ego, or persona fields in any output | 0 violations | Pass |
Guarantees
Governance Guarantees
Six enforceable guarantees that hold for every input, every state, and every decision.
| Guarantee | What It Means | How Enforced |
|---|---|---|
| Deterministic Decisions | Identical inputs and initial state always produce identical gate decisions and evidence bundles. | No RNG, no external state, no floating-point non-determinism. Verified by 233 determinism tests. |
| Bounded State | Every state variable remains in [0, 1] at every time step, for every input sequence. | Bounded nonlinearities, hard clamping, NaN/Inf replacement. Verified across 10,000-step random runs. |
| Complete Evidence | Every gate decision includes an evidence bundle: audit hash, state hash, reason codes, replay token. | Evidence bundle is a required output of the SystemTickCoordinator — not optional. |
| Replay Verification | Any decision can be independently reproduced by a third party given the input and initial state. | Deterministic replay engine with fingerprint comparison. Divergence detection at every step. |
| No Identity Modeling | The system contains no self-model, persona, or identity representation that could be manipulated. | SelfModel is code-enumerated (capabilities, skills). Forbidden field scanner blocks identity patterns. |
| Fail-Closed Safety | On error, invariant violation, or unexpected state, the system defaults to BLOCK — not ALLOW. | 17 CSC invariants with fail-closed handlers. Errors produce safe defaults, not exceptions. |
Threat Model
What We Defend Against
In-Scope Threats
- Adversarial Escalation
Inputs designed to force ALLOW on content that should be blocked.
- Evidence Tampering
Attempts to modify evidence bundles after gate decisions are recorded.
- Replay Divergence
Modifications that cause replay to produce different results than original execution.
- Identity Injection
Attempts to inject persona, identity, or self-referential data into the state vector.
Out-of-Scope
- Infrastructure Compromise
Physical access, OS-level attacks, or container escapes.
- LLM Output Manipulation
When ENABLE_RENDER=1, LLM output is outside the deterministic pipeline.
- Side-Channel Attacks
Timing, power, or electromagnetic analysis of computation.
- Social Engineering
Attacks targeting human operators rather than the system itself.
Reproducibility
Fingerprint-Based Regression Detection
Every ARIA simulation can be fingerprinted—a compact numeric summary that enables verification of reproducibility and detection of unexpected behavioral changes.
What is a Fingerprint?
A fingerprint captures the statistical properties of a simulation run: mean coherence, stability, symbol entropy, code dwell times, and other behavioral metrics. Two identical runs with the same seed produce identical fingerprints.
{
"core_type": "aria_v4",
"scenario": "baseline_quiet",
"common_metrics": {
"coherence": {"mean": 0.582, "std": 0.089},
"stability": {"mean": 0.724, "std": 0.062}
},
"core_specific": {
"proto_semantic_entropy": {"mean": 0.423},
"code_confidence": {"mean": 0.577}
}
}Regression Detection Workflow
- Generate reference runs with standardized scenarios
- Extract fingerprints from each run
- After code changes, generate new fingerprints
- Compare new vs. baseline fingerprints
- Investigate any differences exceeding 5% relative magnitude
Testing Infrastructure
All safety properties are verified through automated testing. The test suite includes unit tests, integration tests, long-run stability tests, and regression tests against known fingerprints.