About StupidLLM

What is StupidLLM?

StupidLLM is the open incident database for AI coding agent failures. We assign STUPID-IDs to documented cases where AI agents cause real damage — the same way CVE tracks cybersecurity vulnerabilities.

We track failures from Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, Aider, and every other AI coding agent. Each incident is severity-scored, verified against source evidence, and categorized by failure mode and root cause.

Why does this exist?

AI coding agents are deployed in production environments with real consequences. When an agent deletes migration files, introduces security vulnerabilities, or enters infinite loops burning compute credits, developers need a central place to:

  • 1. Know what can go wrong before deploying an AI agent
  • 2. Compare reliability across agents with data, not marketing
  • 3. Learn from others' failures to prevent repeating them
  • 4. Hold vendors accountable with documented, severity-scored evidence

How incidents are scored

We use a CVSS-inspired 0-10 severity scale:

9-10
CRITICAL
Data loss, security breach, production down
7-8
HIGH
Significant damage, hours lost
4-6
MEDIUM
Wrong output, wasted time
0-3
LOW
Minor annoyance, easily caught

Data sources

Incidents are sourced from:

  • GitHub PRs and issues — AI-generated code that broke production
  • Developer reports — First-hand accounts from users of AI coding agents
  • Social media — Documented failures shared on X/Twitter, Reddit, Hacker News
  • Benchmarks — Systematic testing of agent capabilities
  • Session pastes — Full conversation logs showing agent failures