Documentation Index
Fetch the complete documentation index at: https://verdictweight.dev/llms.txt
Use this file to discover all available pages before exploring further.
The category
ML observability platforms monitor production AI systems for drift, performance degradation, and operational anomalies. The category includes Arize, Fiddler, Arthur, WhyLabs, and a number of newer LLM-specific entrants. These are mature, well-funded products with strong customer bases. The category provides genuinely valuable functionality. It is also a different functional category from VERDICT WEIGHT.What ML observability platforms do
The recognizable capabilities:- Performance monitoring — track model accuracy, latency, and throughput over time.
- Drift detection — alert when input or prediction distributions shift from baseline.
- Data quality monitoring — flag missing values, schema violations, and statistical anomalies.
- Model explainability — SHAP values, LIME, attention visualizations.
- A/B testing infrastructure — compare model variants in production.
- Bias and fairness monitoring — track outcome disparities across protected groups.
- LLM-specific evaluations — toxicity scoring, hallucination detection, semantic similarity.
- Dashboards and alerting — visualization and notification on monitored signals.
Where the categories diverge
The structural difference is position in the pipeline:- Observability platforms sit outside the inference path, ingesting logs and producing aggregate signals over time.
- VERDICT WEIGHT sits inside the inference path, computing a calibrated confidence value that the inference path itself uses to gate decisions.
Capability comparison
| Capability | Observability platform | VERDICT WEIGHT |
|---|---|---|
| Aggregate performance dashboards | Yes | No |
| Drift detection over time | Yes | Per-call only (Stream 3) |
| Data quality monitoring | Yes | Limited (Stream 1) |
| SHAP / LIME explainability | Yes | No |
| Bias and fairness reporting | Yes | Substrate only (audit-chain replay) |
| LLM toxicity / hallucination scoring | Yes | No |
| Calibrated confidence per decision | Limited | Yes |
| Per-decision audit record (cryptographic) | No | Yes |
| Adversarial input detection | Limited | Yes (Stream 6) |
| Kill switch primitive | No | Yes (Stream 8) |
| Integration into the inference path | No | Yes |
Where they complement
A defense-grade or regulated deployment plausibly runs both:- VERDICT WEIGHT in the inference path produces calibrated confidence, the audit record, and the gating decision.
- An observability platform ingests the audit-chain records (or a derivative log) and produces aggregate dashboards on confidence distribution, abstention rate, kill-switch event rate, and per-stream behavior over time.
What VERDICT WEIGHT does not try to be
To be precise about scope:- Not a dashboard. The framework produces structured data; it does not render it.
- Not a long-term-trends tracker. Audit-chain records can be aggregated externally; aggregation is not the framework’s job.
- Not a SaaS. No managed-service component. No data leaves the deployment.
- Not a model explainability tool. Per-stream contributions are operationally interpretable but are not SHAP / LIME.
- Not a fairness monitor. Audit records support per-subgroup analysis; the analysis itself is operator-supplied.
When to choose which
Is the question 'what is happening across this fleet of models over time?'
That is observability. VERDICT WEIGHT does not do this.
Is the question 'should the system act on this specific decision, with what evidence, and with what audit record?'
That is VERDICT WEIGHT. Observability platforms do not do this.
Is the question 'has the input distribution shifted this week?'
Drift detection is observability. The framework’s per-call stability checks (Stream 3) are not a substitute for distributional drift monitoring.
Is the question 'do I have a tamper-evident record of the decision?'
That is the audit chain (Stream 7). Observability platforms produce logs, not cryptographically chained records.
A precise framing for acquisition or board-level review
The cleanest framing of the relationship between the categories:- Observability platforms answer operational questions about AI systems running in production.
- VERDICT WEIGHT answers decisional questions inside individual AI inferences.