Skip to main content

Documentation Index

Fetch the complete documentation index at: https://verdictweight.dev/llms.txt

Use this file to discover all available pages before exploring further.

Why a threat model matters here

A confidence-scoring framework that does not state its threat model is a framework that cannot be evaluated for fitness against any specific deployment. VERDICT WEIGHT’s threat model is documented explicitly so that operators, architects, and acquisition-side reviewers can decide whether the framework is the right shape for their problem — and so that the framework is not credited with defenses it does not provide. This page is the operational summary of the threat model. The formal treatment is in Paper 3.

Adversary capabilities considered

The framework’s defenses are designed against an adversary with the following capabilities:
CapabilityConsidered
Knowledge of the public framework designYes
Knowledge of the deployed configurationPartial — depends on operator OPSEC
Black-box query access to the deployed scorerYes
White-box knowledge of the upstream modelYes (this is the upstream system’s threat model, not the framework’s)
Ability to modify inputs to the scorerYes (this is what hardening streams target)
Ability to modify the scoring layer itselfDefended by Streams 7, 8
Ability to modify the audit log offlineDefended by Stream 7
Adaptive white-box adversary against fingerprinting in Stream 6Not claimed defended — see Curveball attack class

Failure classes addressed

The framework’s failure-class taxonomy (F1–F8) is documented in Completeness proof. Summarized for operational use:
Standard model outputs are systematically overconfident. The framework corrects this through evidence-aware aggregation (Stream 1) and post-hoc reliability mapping (Stream 5).
Naive Bayesian fusion treats correlated sources as independent and produces overconfident posteriors. Stream 4 measures cross-source coherence directly.
Conflating noise with knowledge gaps produces miscalibrated out-of-distribution behavior. Stream 2 decomposes them.
A known LLM failure mode: paraphrasing flips confidence even when prediction is unchanged. Stream 3 penalizes this directly.
Adversarial inputs that flip confidence without flipping prediction — the natural attack against confidence-gated autonomy. Stream 6 detects them.
An attacker who can rewrite the audit trail can hide the basis of any past action. Stream 7’s hash chain makes this tamper-evident.
The framework itself is part of the threat surface. Stream 8’s registry kill switch raises automatically on integrity failure and refuses to score until an operator restores trust.
Most fusion frameworks force an answer even when their inputs disagree. The composition rule treats abstention as a first-class output.

What is not defended

To be useful, a threat model has to state what is out of scope as clearly as what is in scope. The framework explicitly does not claim to defend against:
Compromise of the upstream model itself. If the model produces actively malicious outputs, the framework can detect inconsistency (Streams 3, 4) and out-of-distribution behavior (Stream 2), but it does not replace the model’s own integrity controls.
Adaptive adversaries with full white-box knowledge of fingerprinting. Stream 6 raises the cost of confidence-flip attacks; it does not claim provable security against an attacker who knows the framework, the fingerprint band, and the validation distribution and crafts inputs accordingly.
Failures of operator OPSEC. A signing key in cleartext on a writable filesystem is a failure mode the framework cannot detect. The audit chain’s integrity assumes operator-controlled keys are operator-controlled in fact, not just in name.
Out-of-distribution calibration. Stream 5’s reliability map is empirical and in-distribution. Out-of-distribution behavior is flagged by Streams 2 and 4, but reported confidence on out-of-distribution inputs is not guaranteed to be calibrated.

Why the framework is built for adversarial environments

The hardening layer (Streams 6, 7, 8) is the structural reason VERDICT WEIGHT is positioned for defense and critical infrastructure rather than for general developer tooling. Each of the three hardening streams targets a failure class that a general-purpose deployment can typically tolerate but that a defense-grade deployment cannot:
  • Curveball detection (Stream 6) is the natural defense against confidence-gated autonomous systems being weaponized against their operators.
  • Hash-chain integrity (Stream 7) is the audit primitive for after-action review, regulatory compliance, and legal discovery.
  • Registry kill switch (Stream 8) is the safe-failure-by-default behavior that operators in high-consequence environments require.
Together they distinguish a confidence layer that is deployment-ready in adversarial environments from one that is a research curiosity.

Mapping the threat model to deployment scenarios

DeploymentMost relevant streams
Defense / national securityAll eight; Stream 6 is the differentiator
Critical infrastructureAll eight; Streams 7, 8 are the audit/compliance anchors
Regulated industry (healthcare, finance, legal)Streams 1–5 + Stream 7 for audit
Internal toolingStreams 1–5; hardening optional
Research / experimentationConfigurable; ablation-friendly
See Pilot engagement for how to map a specific deployment scenario to a configuration.