VERDICT WEIGHT - Confidence Scoring for Autonomous AI

The problem

Modern AI systems — especially autonomous and agentic ones — produce decisions without producing trustworthy confidence about those decisions. Softmax probabilities are miscalibrated. Self-reports from LLMs are aspirational. Ensembles average away the signal that matters most: when not to act.

In high-stakes deployments, the absence of a defensible confidence layer is the gap between “deployable AI” and “auditable AI.”

The framework

VERDICT WEIGHT closes that gap by composing eight independent evidence streams into a single confidence score:

Evidence aggregation (Stream 1)

Combines model outputs, retrieval signals, and structured priors using uncertainty-aware fusion rather than naive averaging.

Uncertainty quantification (Stream 2)

Decomposes total uncertainty into aleatoric and epistemic components, exposing what the system cannot know.

Temporal stability (Stream 3)

Penalizes confidence that fluctuates across semantically equivalent inputs — a known LLM failure mode.

Cross-source coherence (Stream 4)

Cross-checks the decision against independent signal sources; rewards corroboration, surfaces contradiction.

Calibration (Stream 5)

Applies post-hoc reliability correction so reported confidence matches empirical correctness.

SIS / Curveball detection (Stream 6)

Detects adversarial inputs designed to flip a system’s confidence without flipping its prediction.

CPS / hash-chain integrity (Stream 7)

Cryptographically chains every scoring event to its predecessor, producing a tamper-evident audit log.

RIS / registry kill switch (Stream 8)

A binary, registry-level abort condition that overrides composed confidence when integrity is compromised.

What you get

A single calibrated confidence score suitable for thresholding, gating, or escalation.

A decomposed evidence trail showing which streams agreed, disagreed, or abstained.

A cryptographic provenance chain suitable for after-action review, regulatory audit, or legal discovery.

A kill switch that fires deterministically when adversarial or integrity conditions are detected.

VERDICT WEIGHT is a scoring layer, not a model. It composes signals from whatever model stack you already run. It is model-agnostic, vendor-agnostic, and has no external runtime dependencies on cloud services.

Introduction

Installation

What is VERDICT WEIGHT?

The problem

The framework

What you get

Introduction

Installation

Documentation Index

​The problem

​The framework

​What you get

The problem

The framework

What you get