Documentation Index
Fetch the complete documentation index at: https://verdictweight.dev/llms.txt
Use this file to discover all available pages before exploring further.
Thresholds are policy, not engineering
VERDICT WEIGHT produces a calibrated confidence value. It does not, and cannot, decide what level of confidence is sufficient to act on. That decision depends on the cost of acting wrongly versus the cost of escalating — which is a property of the deployment, not of the framework. This page documents the available thresholds, how they interact, and a defensible procedure for choosing them.The four thresholds
| Threshold | Default | Effect |
|---|---|---|
action_threshold | 0.85 | Above this, should_act is True. |
escalation_threshold | 0.60 | Below this, the result is flagged for human review even if no abstention triggered. |
abstention_coherence_min | 0.40 | Below this Stream 4 coherence value, the framework abstains. |
abstention_epistemic_max | 0.70 | Above this Stream 2 epistemic value, the framework abstains. |
Visualizing the regions
A scoring outcome falls into one of four regions, determined by the calibrated confidence and the abstention rules:| Region | Confidence band | Outcome |
|---|---|---|
| Act | confidence >= action_threshold and no abstention | score, should_act = True |
| Review | escalation_threshold <= confidence < action_threshold | score, should_act = False, flagged |
| Decline | confidence < escalation_threshold and no abstention | score, should_act = False |
| Abstain | abstention rule triggered | abstain |
abort outcomes raised by hardening streams; those are not governed by these thresholds.
Choosing action_threshold
The defensible procedure:
Estimate the cost of acting wrongly
Quantify, in whatever currency makes sense for the deployment, the cost of an incorrect action. Use a representative adverse case, not a worst case.
Estimate the cost of escalating
Quantify the cost of routing the decision to human review. This is rarely zero — humans are expensive, slow, and themselves miscalibrated.
Compute the breakeven correctness rate
The breakeven correctness rate is
cost_of_escalation / (cost_of_escalation + cost_of_wrong_action). The action threshold should be set to a value at or above this rate, expressed as a calibrated confidence.Choosing escalation_threshold
The escalation threshold is the value below which a non-abstain score outcome should still be flagged for review. It serves a different purpose from the action threshold: it captures the operational reality that not acting is not the same as being safe. A confidence of 0.30 is weak evidence; a sustained pattern of low-confidence non-actions may itself be a signal that warrants attention.
A reasonable default is 30–50% below the action threshold. A defensible procedure is to set it at the level where the marginal cost of human review begins to exceed the cost of taking no action at all.
Choosing abstention thresholds
The two abstention thresholds (abstention_coherence_min, abstention_epistemic_max) are not policy choices in the same sense. They are detection thresholds for contradictory and out-of-distribution inputs respectively. They should be set:
- Conservatively in deployments where downstream review is cheap.
- Tighter in deployments where excessive abstention has operational cost.