StupidLLM

The incident database for AI agent failures

When Devin deletes your migration files, when Cursor enters an infinite loop, when Copilot leaks your API keys — we document it. Severity-scored, verified, and searchable.

24
Incidents Documented
5.7
Avg Severity /10
10
Agents Tracked
View Dashboard Browse Incidents

Latest Incidents

Highest Severity

What is StupidLLM?

StupidLLM is the open incident database for AI coding agent failures. Like CVE for cybersecurity vulnerabilities, we assign STUPID-IDs to documented cases where AI agents like Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, and Aider cause real damage — deleted files, security vulnerabilities, infinite loops, wasted resources, and broken production systems.

Every incident is severity-scored using our CVSS-inspired rating system, verified against source evidence, and searchable by agent, failure mode, and root cause. We track reliability trends across agents so developers and enterprises can make informed decisions about which AI tools to trust.

How are AI agent incidents scored?

Every incident is severity-scored on a 0-10 scale using a CVSS-inspired rating system. Scores of 9-10 are critical, 7-8 are high, 4-6 are medium, and 0-3 are low severity. Incidents are verified against source evidence and categorized by failure mode (hallucination, destructive action, infinite loop, etc.) and root cause.

Which AI coding agent has the most failures?

Visit the StupidLLM dashboard for live rankings of AI agent failure rates. We track 24 incidents across 10 agents including Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, and Aider, with average severity scores and risk levels.

How can I report an AI agent failure?

You can report an AI agent incident by providing the agent name, what you asked it to do, what it actually did, and the severity of the impact. Source URLs (GitHub PRs, tweets, blog posts) help us verify incidents. Each report receives a STUPID-ID for tracking.

AI Agent Failure Modes