Documentation Index
Fetch the complete documentation index at: https://verdictweight.dev/llms.txt
Use this file to discover all available pages before exploring further.
Purpose
Modern model outputs are unstable. Re-phrasing a prompt, re-ordering retrieved context, or sampling at a different temperature can flip the model’s confidence dramatically — even when the underlying decision should not change. Stream 3 measures that instability and penalizes it. The framework’s position is that if confidence depends on the order in which evidence was presented, the confidence is not real.What the stream does
Generate semantically equivalent perturbations
Either at scoring time or via cached precomputed variants, the framework evaluates the same decision under a small number of equivalent input formulations.
Measure confidence variance across variants
The variance (or a robust analog) of the confidence values across variants quantifies temporal instability.
Why this matters in practice
The most insidious failure mode of LLM-based decisioning is plausible volatility: the system reports 0.9 confidence on one phrasing and 0.4 on a paraphrase, with neither answer obviously wrong. Without Stream 3, both readings get folded into downstream pipelines as if they were the same kind of signal. They are not. One of them is, by definition, miscalibrated. This stream is also what makes the framework robust to prompt-ordering attacks — adversarial reformulations that are not strong enough to flip the prediction but are strong enough to inflate confidence. Those attacks raise variance across variants, which the stream catches.Configuration surface
- Number of perturbations — the trade-off between scoring latency and stability resolution.
- Perturbation strategy — paraphrase, reorder, resample, or a combination.
- Variance threshold for abstention — if variance is high enough, the stream can drive abstention rather than merely discount.