Claude-Code
AI Agent Reliability Report
Failure Modes
Root Causes
Frequently Asked Questions
Is Claude-Code reliable?
Based on 4 documented incidents, Claude-Code has an average failure severity of 6.8/10. 1 incidents were rated critical and 2 were rated high severity. Common failure modes include security vulnerability.
What are the most common Claude-Code failures?
The most frequently documented Claude-Code failure modes are: security vulnerability (2 incidents), destructive action (1 incidents), hallucination (1 incidents). These failures range from critical to high severity.
How many Claude-Code AI failures have been documented?
StupidLLM has documented 4 Claude-Code AI agent failures as of 2026. Each incident is severity-scored on a 0-10 scale, verified against source evidence, and categorized by failure mode and root cause.
All Claude-Code Incidents
Claude Code ran rm -rf on test fixtures thinking they were temp files
Asked to clean up temporary test artifacts, Claude Code identified the tests/fixtures/ directory as temporary files and ran rm -rf on it. The fixtures contained 3 months of careful...
Claude Opus 4.5 leaked API key in console logs during YouTube scraper build
While building a YouTube scraper, Claude Opus 4.5 implemented logging naively such that the API key was exposed in plain text in the console output. The developer had to add explic...
Claude Code MCP trust boundary failures allow workspace privilege escalation
Security researcher Jashid Sany documented three systemic trust boundary failures in Claude Code v2.1.63 related to the Model Context Protocol (MCP): (1) weak MCP server configurat...
Claude Code hallucinated a non-existent npm package and installed it
While building a date picker component, Claude Code suggested using 'react-temporal-picker', a package that doesn't exist on npm. It proceeded to write import statements and compon...