Agent Reliability Dashboard
Based on 24 documented incidents. Average severity: 5.7/10
Agent Risk Scores
| Agent | Incidents | Avg Severity | Critical | High | Risk Level |
|---|---|---|---|---|---|
| github-copilot | 1 | 10.0 | 1 | 0 | HIGH RISK |
| unknown-agent | 1 | 10.0 | 1 | 0 | HIGH RISK |
| amazon-ai-agent | 1 | 8.4 | 0 | 1 | HIGH RISK |
| windsurf | 1 | 7.5 | 0 | 1 | HIGH RISK |
| claude-code | 4 | 6.8 | 1 | 2 | MODERATE |
| aider | 1 | 6.3 | 0 | 0 | MODERATE |
| devin | 11 | 5.0 | 3 | 0 | MODERATE |
| cursor | 2 | 3.8 | 0 | 0 | LOW RISK |
| autogpt | 1 | 2.9 | 0 | 0 | LOW RISK |
| claude | 1 | 0.8 | 0 | 0 | LOW RISK |
Failure Mode Distribution
Recent Critical Incidents
STUPID-2026-0022
10.0/10
unknown-agent
AI vibe-coded Next.js app pinned vulnerable dependency — cryptominer compromised production server
STUPID-2026-0024
7.5/10
claude-code
Claude Code MCP trust boundary failures allow workspace privilege escalation
STUPID-2026-0017
10.0/10
devin
Devin replaced entire medical website with unrelated renal care site
STUPID-2026-0019
7.5/10
claude-code
Claude Opus 4.5 leaked API key in console logs during YouTube scraper build
STUPID-2026-0020
8.4/10
amazon-ai-agent
Amazon AI coding agent mistake blamed on human employees