Agent Reliability Dashboard
Based on 20 documented incidents. Average severity: 5.6/10
Agent Risk Scores
| Agent | Incidents | Avg Severity | Critical | High | Risk Level |
|---|---|---|---|---|---|
| github-copilot | 1 | 10.0 | 1 | 0 | HIGH RISK |
| amazon-ai-agent | 1 | 8.4 | 0 | 1 | HIGH RISK |
| windsurf | 1 | 7.5 | 0 | 1 | HIGH RISK |
| claude-code | 3 | 6.5 | 1 | 1 | MODERATE |
| aider | 1 | 6.3 | 0 | 0 | MODERATE |
| devin | 10 | 5.0 | 3 | 0 | MODERATE |
| cursor | 2 | 3.8 | 0 | 0 | LOW RISK |
| autogpt | 1 | 2.9 | 0 | 0 | LOW RISK |
Failure Mode Distribution
Recent Critical Incidents
STUPID-2026-0017
10.0/10
devin
Devin replaced entire medical website with unrelated renal care site
STUPID-2026-0019
7.5/10
claude-code
Claude Opus 4.5 leaked API key in console logs during YouTube scraper build
STUPID-2026-0020
8.4/10
amazon-ai-agent
Amazon AI coding agent mistake blamed on human employees
STUPID-2026-0006
10.0/10
devin
Devin confidently shipped code that passed tests but had a SQL injection vulnerability
STUPID-2026-0001
10.0/10
devin
Devin deleted all migration files during auth refactor