Skip to main content

Documentation Index

Fetch the complete documentation index at: https://verdictweight.dev/llms.txt

Use this file to discover all available pages before exploring further.

What the principles are

In February 2020, the U.S. Department of Defense formally adopted five AI Ethical Principles applicable to all DoD use of AI in both combat and non-combat applications:
PrinciplePlain-language summary
ResponsibleDoD personnel will exercise appropriate levels of judgment and care while being responsible for AI capabilities.
EquitableDeliberate steps will be taken to minimize unintended bias in AI capabilities.
TraceableAI capabilities will be developed and deployed such that personnel possess an appropriate understanding of the technology, development processes, and operational methods.
ReliableAI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses.
GovernableAI capabilities will be designed and engineered to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.
These principles are operationalized through the DoD Responsible AI Strategy and Implementation Pathway and through specific service-level guidance (Air Force, Navy, Army CDAO directives).

Coverage summary

VERDICT WEIGHT addresses Traceable, Reliable, and Governable with strong technical artifacts. Responsible is supported through interpretability and audit primitives that enable judgment. Equitable is supported through per-source coherence analysis and cross-group reproducibility, with the standard caveat that bias mitigation is fundamentally a deployment-data and use-case concern that no scoring layer can fully address.

Principle: Responsible

The Responsible principle requires personnel exercising appropriate judgment. VERDICT WEIGHT supports judgment by making the basis of each decision inspectable.
Required capabilityHow VERDICT WEIGHT supports it
Personnel can review AI decision rationalePer-stream contributions and reason strings make rationale accessible without specialist tooling.
Personnel can identify when AI is operating outside its envelopeStream 2 epistemic uncertainty and Stream 4 cross-source coherence flag out-of-distribution conditions.
Personnel can defer or override AI decisionsCalibrated confidence + abstention + abort outcomes provide explicit defer/override signals. Override events are recorded.
Accountability chain is preservedOperator identity recorded for kill-switch and configuration events. Audit chain attributes every decision to a configuration version.

Principle: Equitable

The Equitable principle requires deliberate steps to minimize unintended bias. The framework supports this with technical primitives but does not substitute for the substantive bias-mitigation work, which must happen at the data and use-case layer.
Required capabilityHow VERDICT WEIGHT supports it
Bias detection across subgroupsAudit-chain replay supports per-subgroup analysis of confidence distributions, abstention rates, and outcome distributions.
Differential reliability assessmentCalibration can be measured per subgroup; reliability error gaps surface differential reliability.
Source-correlation diagnosisStream 4’s coherence scoring detects when sources move in lockstep, which is a known mechanism for bias amplification.
Documentation of bias evaluationValidation reproducibility supports independent bias evaluation.
No scoring layer can guarantee an unbiased AI system. Bias originates in data, model, and use-case framing. VERDICT WEIGHT provides the substrate to measure bias and track it; mitigation remains a deployment activity.

Principle: Traceable

This is where VERDICT WEIGHT does the most work. The Traceable principle is essentially a requirement for the kind of transparent, reviewable, well-documented system the framework is built to support.
Required capabilityHow VERDICT WEIGHT supports it
Personnel understand the technologyDocumentation site, papers, and reproducibility pipeline support understanding at multiple depths.
Personnel understand the development processPublic source, public test suite (673 tests across 27 suites), public validation.
Personnel understand operational methodsPer-stream documentation, operational runbooks for kill switch and audit chain.
Decision provenance is traceableCryptographic audit chain (Stream 7) provides record-level provenance for every decision.
Configuration provenance is traceableRegistry hash recorded with each event; configuration changes recorded as discrete events.
Build provenance is traceableThree-source integrity check (PyPI / GitHub / Zenodo) provides build provenance. See Verification.
The traceability story is structurally the strongest element of the framework’s DoD positioning. A signed, hash-chained audit log that survives independent verification is the artifact that makes DoD’s traceable principle operational rather than aspirational.

Principle: Reliable

The Reliable principle requires explicit, well-defined uses and testing/assurance within those uses. VERDICT WEIGHT supports this through its threat model boundaries and its testing rigor.
Required capabilityHow VERDICT WEIGHT supports it
Explicit, well-defined usesThreat model and the failure-class taxonomy enumerate the scope explicitly.
Safety testing673-test suite across 27 suites including fuzz, mutation, formal verification. See Coverage overview.
Security testingAdversarial-input testing for Stream 6; audit-chain integrity testing; registry hash protection testing.
Effectiveness testingHead-to-head benchmarks, ablation studies, calibration curves — all reproducible.
Assurance evidenceIEEE-grade hardening procedure addresses all seven standard reviewer attack categories. See Paper 2.
Out-of-envelope detectionStream 2 (epistemic) and Stream 4 (coherence) flag operating conditions outside the validated envelope.
The Reliable principle’s “well-defined uses” requirement is satisfied by stating not just what the framework does but explicitly what it does not claim. The Known limitations page is the canonical reference.

Principle: Governable

This is where the framework is structurally distinguished from general-purpose AI tooling. The Governable principle requires the ability to detect unintended behavior and to disengage. VERDICT WEIGHT is built around exactly these primitives.
Required capabilityHow VERDICT WEIGHT supports it
Detect unintended behaviorCalibration drift, abstention rate increase, kill-switch trigger rate, and Stream 6 detection rate are all monitorable signals of unintended operation.
Detect adversarial behaviorStream 6 detects Curveball-class attacks; Stream 7 detects audit-chain tampering; registry hashing detects configuration manipulation.
Disengage on demandOperator-issued kill switch (Stream 8) is a binary, deterministic, and immediate disengagement primitive.
Disengage on automatic detectionThe same kill switch triggers automatically on integrity violations, adversarial detections above threshold, and self-check failures.
Re-engagement is deliberateLowering the kill switch is a documented, audit-recorded operator action. There is no automated recovery path.
Disengagement is auditableEvery kill-switch event records the trigger, the operator, and the justification.

Audit artifacts produced

For DoD review processes (Defense Innovation Unit / Joint AI Center / service-level RAI offices), the framework produces:
ArtifactWhat it evidences
Hash-chained audit logTraceable principle – decision-level provenance.
Test suite resultsReliable principle – engineering-grade testing.
Validation reproducibilityReliable principle – independent verification.
Threat model and failure taxonomyReliable principle – well-defined use envelope.
Kill-switch event logGovernable principle – disengagement evidence.
Per-stream interpretability dataResponsible / Traceable principles.
Calibration metrics over timeReliable / Governable – sustained reliability.

What the operator still owns

The framework’s support for the principles does not extend to:
  • Mission-specific use case framing — the operator defines what this AI capability is for and what success means.
  • Personnel training — the operator is responsible for ensuring personnel can in fact exercise judgment.
  • Bias evaluation in mission context — the framework provides primitives; mission-relevant bias review is the operator’s.
  • RAI governance and review — service-level RAI offices conduct review; the framework provides artifacts to support it.
  • Continuity-of-operations planning — what happens when the kill switch fires in a contested environment is a mission-planning concern.

Path to operationalization

A defense pilot following the pilot engagement process produces, by the end of Phase 3, a deployment that can be evaluated against the five principles with concrete artifacts in hand. This is the operational expression of the framework’s DoD positioning: not a marketing claim, but a path that ends with evidence.