Documentation Index
Fetch the complete documentation index at: https://verdictweight.dev/llms.txt
Use this file to discover all available pages before exploring further.
Sources
| Source | Used for |
|---|---|
| NIST National Vulnerability Database (NVD) | Per-CVE evidence (CVSS metrics, descriptions, references). |
| CISA Known Exploited Vulnerabilities (KEV) | Positive-class anchor for ground-truth labeling. |
Snapshot strategy
Public vulnerability data evolves continuously. To make the benchmark reproducible, the validation pipeline supports two modes:- Live mode: pulls current data at benchmark time. Useful for ongoing internal validation.
- Snapshot mode: uses a pinned snapshot of NVD and KEV at a specified date. Required for cross-deployment comparison and for reproducing published results.
Evidence-vector construction
Each CVE record is mapped to an evidence vector consumed by Stream 1. The mapping is fixed, documented, and published with the framework:| Evidence channel | Source field | Normalization |
|---|---|---|
| Exploit-availability | NVD cvssV3.exploitabilityScore | Linear scale to [0, 1]. |
| Severity (vendor) | NVD cvssV3.baseSeverity | Ordinal mapping (low=0.25 … critical=1.0). |
| Reference quality | Count and type of references | Heuristic in [0, 1]. |
| Description specificity | NVD descriptions[*].value length and entity density | Heuristic in [0, 1]. |
| Vendor advisory | Presence of vendor security advisory in references | Binary. |
verdict_weight/benchmarks/real_world/evidence.py. Any change to the mapping changes the benchmark; published results pin the mapping version.
Ground-truth labeling
A CVE is labeled positive (high confidence in real-world risk) if and only if it appears in the CISA KEV catalog at the snapshot date. This is a deliberately conservative labeling rule:- KEV inclusion is documented evidence of real-world exploitation.
- Non-inclusion is not evidence of safety; many CVEs are exploited in the wild without being added to KEV. The benchmark therefore should be read as a lower bound on positive-class recall.
Class balance
The 120-CVE dataset is class-balanced by construction: half the records are KEV-listed, half are not. Class-balanced benchmarking is the standard for calibration metrics, where imbalanced data can produce misleading reliability curves. For real deployments, the class distribution will differ. Refitting Stream 5’s calibration map on representative data (see Calibration) is the appropriate response.Filtering and exclusions
The 120-CVE dataset is drawn from the NVD population by:- Excluding records with missing CVSSv3 scores.
- Excluding records published in the last 30 days (insufficient time for KEV inclusion to settle).
- Stratified sampling to achieve class balance.
- Exclusions documented in the methodology code; nothing is excluded that is not visible in the published code.
verdict_weight/benchmarks/real_world/select.py.