Triage Metrics
Accuracy Overview
Weighted: Category 40% + Severity 40% + Route 20%. Based on 1 reviewed tickets.
Accuracy by Category
Weighted accuracy per category (Category 40% + Severity 40% + Route 20%). Override rate shows how often the reviewer disagreed with the AI.
| Category | Tickets | Category Acc. | Severity Acc. | Route Acc. | Weighted Acc. | Override Rate |
|---|---|---|---|---|---|---|
| Login/Auth | 1 | 100% | 100% | 100% | 100% | 0% |
Governance — Forced Review
| Gating Reason | Count |
|---|---|
| High Severity (P1/P2) | 0 |
| Sensitive Category | 0 |
| Low Confidence | 0 |
Confidence Calibration
When the AI reports a confidence level, how accurate is it actually? A well-calibrated model should show accuracy close to its stated confidence. Based on 1 reviewed tickets.
pp = percentage points. Green = well-calibrated (within 5pp). Amber = slightly over-confident. Red = significantly over-confident.
Prompt Version Tracking
Accuracy breakdown by prompt version and model. Use this to compare performance across prompt iterations.
| Prompt Version | Model | Tickets Triaged | Reviewed | Category Acc. | Severity Acc. | Route Acc. | Weighted Acc. |
|---|---|---|---|---|---|---|---|
| triage-v1 | deepseek-chat | 5 | 1 | 100% | 100% | 100% | 100% |
Ticket Distribution
Breakdown of AI-triaged tickets by category, severity, and route. Based on 5 triaged tickets.
Category
Severity
Route
Confusion Matrices
Rows = AI predicted · Columns = Human final decision. Diagonal = correct predictions (highlighted).
Category
| Predicted ↓ / Actual → | Login/Auth |
|---|---|
| Login/Auth | 1 |
Severity
| Predicted ↓ / Actual → | P2 |
|---|---|
| P2 | 1 |
Route
| Predicted ↓ / Actual → | Support L1 |
|---|---|
| Support L1 | 1 |