About StupidLLM — The AI Agent Incident Database

What is StupidLLM?

StupidLLM is the open incident database for AI coding agent failures. We assign STUPID-IDs to documented cases where AI agents cause real damage — the same way CVE tracks cybersecurity vulnerabilities.

We track failures from Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, Aider, and every other AI coding agent. Each incident is severity-scored, verified against source evidence, and categorized by failure mode and root cause.

Why does this exist?

AI coding agents are deployed in production environments with real consequences. When an agent deletes migration files, introduces security vulnerabilities, or enters infinite loops burning compute credits, developers need a central place to:

1. Know what can go wrong before deploying an AI agent
2. Compare reliability across agents with data, not marketing
3. Learn from others' failures to prevent repeating them
4. Hold vendors accountable with documented, severity-scored evidence

How incidents are scored

We use a CVSS-inspired 0-10 severity scale:

9-10

CRITICAL

Data loss, security breach, production down

7-8

HIGH

Significant damage, hours lost

4-6

MEDIUM

Wrong output, wasted time

0-3

LOW

Minor annoyance, easily caught

Data sources

Incidents are sourced from:

GitHub PRs and issues — AI-generated code that broke production
Developer reports — First-hand accounts from users of AI coding agents
Social media — Documented failures shared on X/Twitter, Reddit, Hacker News
Benchmarks — Systematic testing of agent capabilities
Session pastes — Full conversation logs showing agent failures