StupidLLM
The incident database for AI agent failures
When Devin deletes your migration files, when Cursor enters an infinite loop, when Copilot leaks your API keys — we document it. Severity-scored, verified, and searchable.
Latest Incidents
Devin docs PR rejected by Prefect maintainers — documented behavior from removed feature
Amazon AI coding agent mistake blamed on human employees
Claude Opus 4.5 leaked API key in console logs during YouTube scraper build
Devin added a pointless "Hello!" page to a disease prediction platform
Devin replaced entire medical website with unrelated renal care site
Devin PR broke ledger list API and created buckets on deleted resources
Devin repeatedly submitted identical docs PRs that kept getting rejected
Devin attempted to build entire Figma clone from scratch — 3 rejected attempts
Highest Severity
Devin confidently shipped code that passed tests but had a SQL injection vulnerability
Security Vulnerability
Claude Code ran rm -rf on test fixtures thinking they were temp files
Destructive Action
Copilot autocompleted AWS credentials into public repository
Security Vulnerability
Devin deleted all migration files during auth refactor
Destructive Action
Devin replaced entire medical website with unrelated renal care site
Destructive Action
What is StupidLLM?
StupidLLM is the open incident database for AI coding agent failures. Like CVE for cybersecurity vulnerabilities, we assign STUPID-IDs to documented cases where AI agents like Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, and Aider cause real damage — deleted files, security vulnerabilities, infinite loops, wasted resources, and broken production systems.
Every incident is severity-scored using our CVSS-inspired rating system, verified against source evidence, and searchable by agent, failure mode, and root cause. We track reliability trends across agents so developers and enterprises can make informed decisions about which AI tools to trust.