Documentation Index
Fetch the complete documentation index at: https://verdictweight.dev/llms.txt
Use this file to discover all available pages before exploring further.
The Curveball attack class
A standard adversarial attack tries to flip a model’s prediction. A Curveball attack tries something subtler and, in autonomous deployments, more dangerous: it flips the model’s confidence while leaving the prediction intact. The threat is straightforward. If an autonomous system gates its actions on confidence, an attacker who can drive confidence above the action threshold — without changing the predicted class — can induce the system to take an action the operator never authorized. Conversely, an attacker who can drive confidence down below the threshold can induce silent abstention, freezing a system that should have acted. Curveball attacks are not theoretical. They are the obvious next step once confidence-gated autonomy becomes the deployment norm.Stream 6 — SIS (Stability Integrity Signal)
Stream 6 is the framework’s defense against this class. It runs in parallel with the core scoring streams and produces a binary veto signal that, when raised, drives the composed confidence to zero and routes the decision to abort.What the stream does
Compute a stability fingerprint of the input
The input is examined under perturbations specifically constructed to expose Curveball-style manipulation — perturbations that should change observable confidence in predictable ways for benign inputs and in anomalous ways for adversarial ones.
Compare against the expected fingerprint band
The fingerprint is compared to the expected band derived from in-distribution validation data. Out-of-band fingerprints indicate the input is exhibiting non-natural confidence dynamics.
Raise the veto if the deviation is significant
A configurable significance threshold determines when the deviation is large enough to raise . The threshold is set conservatively by default; operators can tighten or loosen based on their tolerance for false positives versus false negatives.
Why this is in the framework rather than upstream
Curveball detection could in principle live in the model, in the inference server, or in a dedicated adversarial-input filter upstream of decisioning. The framework’s position is that it must live adjacent to confidence scoring for two reasons:- The signal it acts on is confidence dynamics. That is precisely what the framework is computing. Doing detection here removes the round-trip and the opportunity for inconsistency.
- The veto must be authoritative. A detection in an upstream filter is advisory; a Stream 6 veto in the framework is binding through the composition rule. The cost of a missed Curveball detection is operationally severe; the framework treats it as such.
What is and is not claimed
What is claimed:- Stream 6 detects naive Curveball attacks at high rates with low false-positive rates. The empirical numbers are in Curveball attack class.
- Stream 6 detects adaptive Curveball attacks at meaningfully elevated cost (the attacker must do measurably more work to evade detection).
- Removing Stream 6 from the composition empirically re-admits the attack class entirely. This is shown in Ablation studies.