Skip to main content

Documentation Index

Fetch the complete documentation index at: https://verdictweight.dev/llms.txt

Use this file to discover all available pages before exploring further.

The category

Several open-source libraries provide calibration as a focused, well-implemented utility for machine-learning practitioners:
  • Netcal — comprehensive Python library for confidence calibration with multiple methods.
  • Uncertainty Toolbox — uncertainty quantification utilities with calibration metrics.
  • scikit-learn CalibratedClassifierCV — calibration as a wrapper around classifiers.
  • TensorFlow Probability / Pyro — broader uncertainty-quantification frameworks that include calibration.
These libraries are excellent at what they do. They are research-grade, well-tested, and widely used. They are also a different category of thing from VERDICT WEIGHT.

What calibration libraries do

The category is focused and well-defined:
  • Take a vector of model probabilities and a vector of ground-truth labels.
  • Fit a calibration map (Platt scaling, isotonic regression, temperature scaling, etc.).
  • Apply the fitted map to new model outputs.
  • Optionally provide reliability diagrams and calibration metrics.
That is, essentially, the whole product. The libraries are imported into research notebooks, used in model-development pipelines, and sometimes deployed as a thin wrapper at inference time.

What VERDICT WEIGHT does that calibration libraries do not

Calibration is one of the framework’s eight streams (Stream 5). The other seven are not in scope for any calibration library.
CapabilityCalibration libraryVERDICT WEIGHT
Fit a calibration mapYesYes (Stream 5)
Apply the map to new outputsYesYes (Stream 5)
Reliability diagrams and metricsYesYes (Stream 5 + validation pipeline)
Evidence aggregation across heterogeneous sourcesNoYes (Stream 1)
Aleatoric / epistemic uncertainty decompositionLimitedYes (Stream 2)
Temporal stability / drift detection per callNoYes (Stream 3)
Cross-source coherence checkingNoYes (Stream 4)
Adversarial input detectionNoYes (Stream 6)
Cryptographic audit chainNoYes (Stream 7)
Registry kill switchNoYes (Stream 8)
Composition rule with veto priorityNoYes
The category difference: calibration libraries are one of the eight things VERDICT WEIGHT does. They are not the layer; they are an input to the layer.

What calibration libraries do that VERDICT WEIGHT does not

CapabilityCalibration libraryVERDICT WEIGHT
Multiple calibration methods (Platt, isotonic, temperature, beta, etc.)YesLimited — one method, justified
Multi-class calibration with detailed taxonomyYesLimited
Distributional calibrationYes in Uncertainty ToolboxNo
Calibration of regression outputsYesNo (decisional, not regression)
Lightweight, narrow scopeYesNo (eight streams composed)
Direct integration into ML training loopsYesNo (operates on inference outputs)
If the operational need is “I have a classifier and I want its outputs better calibrated,” a calibration library is the right tool. The framework’s scope is broader and is justified only when the broader scope is what the deployment requires.

When to use which

The decision is straightforward:

Use a calibration library when

  • You are improving a single classifier’s calibration as part of model development.
  • You do not need adversarial-input detection.
  • You do not need a cryptographic audit primitive.
  • You do not need a kill switch.
  • You are operating in a research or low-stakes deployment context.

Use VERDICT WEIGHT when

  • The deployment is gated on confidence in production.
  • You need composed adversarial / audit / governance primitives, not just calibration.
  • You need calibration plus seven other streams of failure-mode coverage.
  • The deployment is high-stakes (defense, regulated industry, critical infrastructure).
  • Audit and tamper-evidence are operational requirements, not nice-to-haves.

A specific point: calibration is necessary but not sufficient

The most common reason to misjudge this comparison is to assume calibration alone is enough. It is not, for two reasons:
  1. Calibration is in-distribution. A reliability map fitted on validation data does not extrapolate. Out-of-distribution inputs are detected by Streams 2 and 4, not by Stream 5. A deployment with calibration but no OOD detection is calibrated until it isn’t, and then silently miscalibrated.
  2. Calibration does not detect adversarial inputs. An adversary who can drive confidence above the action threshold without changing the prediction is a real threat in autonomous systems. A calibration library, by design, does not address this. Stream 6 does.
A deployment that needs only calibration does not need VERDICT WEIGHT. A deployment that needs calibration plus the things calibration alone does not provide is exactly what the framework is built for.

Citation context

If a reviewer or evaluator asks “how is your work different from existing calibration?”, the precise answer is:
Calibration is one of eight composed streams. The framework’s contribution is not a new calibration method — it is the composition of calibration with adversarial detection, integrity primitives, and an audit substrate sufficient for high-stakes deployment. The empirical headline (REL ~9.6× better than averaging on the validation dataset) reflects the composed pipeline, not Stream 5 alone.
This is the framing used in Paper 2 and is the correct one for academic citation.