VERDICT WEIGHT - Confidence Scoring for Autonomous AI

Design principles

VERDICT WEIGHT is built on five design principles, in priority order:

Composition over monolith

Each stream is independently testable, independently replaceable, and produces an interpretable signal. The composed score is a function of those signals, not a black box trained over them.

Calibration over rank-order

Confidence reported by the system must match correctness observed in the wild. A score of 0.9 must mean “right ~90% of the time.”

Audit by default

Every scoring event is hashed, chained, and reproducible. Audit is not a feature toggle.

Deterministic kill conditions

Adversarial detection and integrity violations route to a binary, registry-level abort — not a soft warning.

No external runtime dependencies

Scoring proceeds without contacting any external service. Suitable for air-gapped and classified environments.

The eight streams

The framework is organized into two layers: core scoring streams (1–5) which produce the calibrated confidence value, and hardening streams (6–8) which protect that value against adversarial manipulation and tampering.

Stream 1: Evidence aggregation

Uncertainty-aware fusion of model outputs, retrieval, and structured priors.

Stream 2: Uncertainty quantification

Decomposition into aleatoric and epistemic components.

Stream 3: Temporal stability

Penalizes confidence drift across semantically equivalent inputs.

Stream 4: Cross-source coherence

Rewards corroboration across independent signal sources.

Stream 5: Calibration

Post-hoc reliability correction.

Stream 6: SIS / Curveball detection

Detects adversarial inputs designed to perturb confidence.

Stream 7: CPS / hash-chain integrity

Cryptographic provenance for every scoring event.

Stream 8: RIS / registry kill switch

Binary abort on integrity compromise.

Composition

The streams compose under a formal rule documented in Eight-stream composition. Two properties of that rule are worth surfacing here:

Hardening streams have veto power. A compromised audit chain (Stream 7) or a triggered kill switch (Stream 8) overrides the composed core score regardless of how high it would otherwise be.
Abstention is a first-class output. When the core streams disagree past a configurable threshold, the framework returns abstention rather than a forced classification. Abstention is the right answer often enough that producing it is a feature, not a failure mode.

Computational profile

Layer	Complexity
Core streams (single decision)	O(1)
Pipeline (K decisions)	O(K)
Audit verification (chain length N)	O(N)

See Complexity analysis for the full derivation and empirical wall-clock measurements.

What the framework does not do

It does not train a model. VERDICT WEIGHT scores decisions produced by an upstream model stack you already control.
It does not replace human review. It produces a calibrated gate. The gate threshold is a policy decision; it is not the framework’s job to decide what is “confident enough.”
It does not promise to detect every adversarial input. Stream 6 raises the cost of a particular attack class (Curveball / confidence-flip). It does not claim universal robustness, and the head-to-head comparison is honest about what is and is not measured.

Overview

Core Streams (1-5)

Hardening Streams (6-8)

Architecture overview

Design principles

The eight streams

Stream 1: Evidence aggregation

Stream 2: Uncertainty quantification

Stream 3: Temporal stability

Stream 4: Cross-source coherence

Stream 5: Calibration

Stream 6: SIS / Curveball detection

Stream 7: CPS / hash-chain integrity

Stream 8: RIS / registry kill switch

Composition

Computational profile

What the framework does not do

Overview

Core Streams (1-5)

Hardening Streams (6-8)

Documentation Index

​Design principles

​The eight streams

Stream 1: Evidence aggregation

Stream 2: Uncertainty quantification

Stream 3: Temporal stability

Stream 4: Cross-source coherence

Stream 5: Calibration

Stream 6: SIS / Curveball detection

Stream 7: CPS / hash-chain integrity

Stream 8: RIS / registry kill switch

​Composition

​Computational profile

​What the framework does not do

Design principles

The eight streams

Composition

Computational profile

What the framework does not do