Skip to main content

Documentation Index

Fetch the complete documentation index at: https://verdictweight.dev/llms.txt

Use this file to discover all available pages before exploring further.

Notation

Let each stream i produce a tuple: si=(ci,,wi,,ai)s_i = (c_i, \\, w_i, \\, a_i) where:
  • ciin[0,1]c_i \\in [0, 1] is the stream’s confidence contribution,
  • wiin[0,1]w_i \\in [0, 1] is its weight in the composition,
  • aiin0,1a_i \\in \\{0, 1\\} is an abstention indicator — the stream did not produce a usable signal.
The hardening streams (6, 7, 8) additionally produce a veto bit viin0,1v_i \\in \\{0, 1\\}.

Composition rule

The composed confidence CC is defined piecewise:
1

Veto check

If any vi=1v_i = 1 for iin6,7,8i \\in \\{6, 7, 8\\}, then C=0C = 0 and the framework returns an abort outcome with the triggering stream identified.
2

Abstention check

If the number of abstaining core streams (1–5) exceeds a configured threshold taua\\tau_a, then CC is undefined and the framework returns abstain.
3

Weighted aggregation

Otherwise, CC is the weight-normalized aggregate of the non-abstaining core stream contributions:C=fracsumiintextcore,ai=0wicdotcisumiintextcore,ai=0wiC = \\frac{\\sum_{i \\in \\text{core}, a_i = 0} w_i \\cdot c_i}{\\sum_{i \\in \\text{core}, a_i = 0} w_i}
4

Calibration correction

The aggregated value is then passed through the calibration map produced by Stream 5:Ctextfinal=phi5(C)C_{\\text{final}} = \\phi_5(C)where phi5\\phi_5 is the empirical reliability map fitted on held-out validation data.

Why this shape

Three properties of the composition rule are worth justifying explicitly, because each was chosen against a plausible alternative.

Veto over weighting

Hardening streams could have been folded into the weighted aggregate as additional signals. They are not. A compromised audit chain or a confirmed Curveball-style adversarial input must drop confidence to zero, not merely reduce it. Allowing those streams to be “outvoted” by high core confidence would defeat the purpose of having them.

Abstention over forced classification

When core streams disagree past threshold, the framework returns abstention rather than averaging through the disagreement. This trades coverage for calibration. In high-stakes deployments, that trade is correct: a system that declines to answer 5% of the time is more useful than one that answers everything with 80% confidence regardless of evidence quality.

Post-hoc calibration over Bayesian fusion

Calibration is applied as a post-hoc map (phi5\\phi_5) rather than baked into the aggregation rule. This decouples the fusion problem from the reliability problem. Fusion logic can be reasoned about in isolation; calibration can be re-fitted on new validation data without redesigning the composition.

Soundness sketch

A more complete soundness argument is given in Completeness proof. The intuition:
  1. The veto check is sound because it is monotone — adding more hardening signals can only raise the rate at which veto fires, not lower it.
  2. The weighted aggregate is calibrated because phi5\\phi_5 is fitted on the empirical reliability of CC on held-out data, then validated under cross-validation.
  3. The abstention rule is conservative: it never produces a confidence value when the underlying evidence is contradictory beyond threshold.
Together these give the formal guarantee that a reported CtextfinalC_{\\text{final}} is either accompanied by an audit-verified evidence chain or the framework returned abort/abstain instead.

Worked example

Consider a decision where:
  • Streams 1–5 produce (0.91,0.88,0.84,0.79,0.86)(0.91, 0.88, 0.84, 0.79, 0.86) with equal weights.
  • All hardening streams pass (no veto).
  • No abstention triggered.
The raw aggregate is: C=frac0.91+0.88+0.84+0.79+0.865=0.856C = \\frac{0.91 + 0.88 + 0.84 + 0.79 + 0.86}{5} = 0.856 After calibration (phi5\\phi_5 on held-out data shows that raw 0.856 corresponds to empirical correctness of approximately 0.83): Ctextfinalapprox0.83C_{\\text{final}} \\approx 0.83 The reported confidence is lower than the raw aggregate. This is the framework working as intended — raw aggregates are systematically overconfident, and calibration corrects for it.