Skip to main content

Documentation Index

Fetch the complete documentation index at: https://verdictweight.dev/llms.txt

Use this file to discover all available pages before exploring further.

Philosophy

VERDICT WEIGHT exposes a deliberately small hyperparameter surface. Each parameter has a documented default that has been validated under the IEEE hardening protocol. The defaults are conservative — biased toward higher abstention and lower false-positive rates — so that an out-of-the-box deployment fails safely rather than produces overconfident scores. Tune parameters only with concrete validation data from your deployment domain.

Per-stream weights

Each core stream (1–5) has a weight in the composition. Defaults are equal:
[streams.weights]
evidence_aggregation = 1.0
uncertainty_quantification = 1.0
temporal_stability = 1.0
cross_source_coherence = 1.0
calibration = 1.0
Weights are normalized at composition time, so absolute magnitudes do not matter; only ratios do.

Action threshold

The threshold above which should_act is set:
[gating]
action_threshold = 0.85   # default
This is the most consequential single parameter in the framework. A higher threshold means fewer actions and more escalations; a lower threshold means more actions and a higher rate of acting on weak evidence. The right value depends entirely on the cost asymmetry of false positives versus false negatives in your deployment.

Abstention rules

[abstention]
core_abstention_max = 2          # max abstaining core streams before forced abstain
coherence_min = 0.4              # min Stream 4 coherence before forced abstain
epistemic_max = 0.7              # max Stream 2 epistemic before forced abstain
The defaults make abstention easy to trigger. This is intentional: a deployment that abstains too often is correctable; a deployment that fails to abstain when it should is dangerous.

Stream 6 (Curveball) sensitivity

[streams.sis]
significance_threshold = 0.95    # confidence level for veto trigger
fingerprint_perturbations = 8    # number of perturbations per evaluation
Higher significance_threshold means fewer false vetoes but more missed detections. Operators should retune this against representative adversarial test cases for their deployment.

Stream 3 (Temporal stability) cost knob

[streams.temporal]
perturbation_count = 3           # default; raise for stability resolution, lower for latency
strategy = "paraphrase"          # one of: paraphrase, reorder, resample, mixed
This is the most expensive single configuration in the framework, since it controls how many additional model evaluations happen per scoring call. Latency-sensitive deployments may set this to 1 and rely on Stream 4 for the same protection.

Audit chain

[audit]
log_path = "/var/log/verdict-weight/chain.log"
signing_key_id = "ops-key-2026"
checkpoint_every = 10000         # records before checkpoint rotation
signing_key_id is required for audit-bound deployments. The framework will run without it but will warn loudly on every startup; this is intentional.
Deployment typeAction thresholdStream 3 perturbationsStream 6 significance
Defense / national security0.955+0.99
Regulated industry0.9030.95
Internal tooling0.801–30.90
Research / experimentation0.7010.80
These are starting points, not endorsements. Validate against domain-representative data.

Sensitivity analysis

The empirical sensitivity of headline metrics to each hyperparameter is reported in Paper 2. The summary: VERDICT WEIGHT’s quality metrics are robust to ±20% perturbation around the defaults across all configurable parameters. This is the basis for the “tuning is not a deployment-blocking step” claim — defaults work reasonably out of the box, and tuning beyond defaults yields incremental rather than categorical improvement.