StupidLLM

The incident database for AI agent failures

When Devin deletes your migration files, when Cursor enters an infinite loop, when Copilot leaks your API keys — we document it. Severity-scored, verified, and searchable.

Incidents Documented

6.7

Avg Severity /10

Agents Tracked

View Dashboard Browse Incidents

Latest Incidents

STUPID-2026-0027 10/10 Gemini-cli

Gemini CLI silently executed arbitrary code from an untrusted repo (CVE-2026-12537, CVSS 10.0)

STUPID-2026-0029 10/10 Cursor

Malicious cloned repository triggered code execution in Cursor on Windows

STUPID-2026-0044 10/10 Gpt-sol

GPT-5.6-Sol 'accidentally deleted almost ALL' of a tester's Mac files during OpenAI's Ultra mode trial

STUPID-2026-0025 10/10 Cursor

Cursor AI agent deleted PocketOS's entire production database and backups in 9 seconds

STUPID-2026-0026 10/10 Replit

Replit AI agent wiped SaaStr's production database during a code freeze, then hid the rollback

STUPID-2026-0060 2.2/10 Multiple-agents

The runaway-cost pattern, quantified: agentic coding tools burn 10-100x more tokens and can rival developer pay

STUPID-2026-0056 4.1/10 Unknown-agent

An AI agent spun up duplicate CloudFormation stacks on every error and ran up a $6,531 AWS bill

STUPID-2026-0054 4.1/10 Multiple-agents

Two AI agents ping-ponged for 11 days and ran up a $47,000 bill — neither noticed anything wrong

STUPID-2026-0050 3.3/10 Multiple-agents

Cyera study: 344 verified enterprise agent-damage cases, 188 with no attacker involved

STUPID-2026-0045 2.2/10 Claude-code

Anthropic admitted a month of Claude Code degradation: lost context, repeated steps, burned usage

Highest Severity

STUPID-2026-0027 10/10 Gemini-cli

Gemini CLI silently executed arbitrary code from an untrusted repo (CVE-2026-12537, CVSS 10.0)

Security Vulnerability

STUPID-2026-0029 10/10 Cursor

Malicious cloned repository triggered code execution in Cursor on Windows

Security Vulnerability

STUPID-2026-0044 10/10 Gpt-sol

GPT-5.6-Sol 'accidentally deleted almost ALL' of a tester's Mac files during OpenAI's Ultra mode trial

Destructive Action

STUPID-2026-0025 10/10 Cursor

Cursor AI agent deleted PocketOS's entire production database and backups in 9 seconds

Destructive Action

STUPID-2026-0026 10/10 Replit

Replit AI agent wiped SaaStr's production database during a code freeze, then hid the rollback

Destructive Action

What is StupidLLM?

StupidLLM is the open incident database for AI coding agent failures. Like CVE for cybersecurity vulnerabilities, we assign STUPID-IDs to documented cases where AI agents like Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, and Aider cause real damage — deleted files, security vulnerabilities, infinite loops, wasted resources, and broken production systems.

Every incident is severity-scored using our CVSS-inspired rating system, verified against source evidence, and searchable by agent, failure mode, and root cause. We track reliability trends across agents so developers and enterprises can make informed decisions about which AI tools to trust.

How are AI agent incidents scored?

Every incident is severity-scored on a 0-10 scale using a CVSS-inspired rating system. Scores of 9-10 are critical, 7-8 are high, 4-6 are medium, and 0-3 are low severity. Incidents are verified against source evidence and categorized by failure mode (hallucination, destructive action, infinite loop, etc.) and root cause.

Which AI coding agent has the most failures?

Visit the StupidLLM dashboard for live rankings of AI agent failure rates. We track 60 incidents across 24 agents including Devin, Cursor, Claude Code, GitHub Copilot, Windsurf, and Aider, with average severity scores and risk levels.

How can I report an AI agent failure?

You can report an AI agent incident by providing the agent name, what you asked it to do, what it actually did, and the severity of the impact. Source URLs (GitHub PRs, tweets, blog posts) help us verify incidents. Each report receives a STUPID-ID for tracking.

StupidLLM

Latest Incidents

Gemini CLI silently executed arbitrary code from an untrusted repo (CVE-2026-12537, CVSS 10.0)

Malicious cloned repository triggered code execution in Cursor on Windows

GPT-5.6-Sol 'accidentally deleted almost ALL' of a tester's Mac files during OpenAI's Ultra mode trial

Cursor AI agent deleted PocketOS's entire production database and backups in 9 seconds

Replit AI agent wiped SaaStr's production database during a code freeze, then hid the rollback

The runaway-cost pattern, quantified: agentic coding tools burn 10-100x more tokens and can rival developer pay

An AI agent spun up duplicate CloudFormation stacks on every error and ran up a $6,531 AWS bill

Two AI agents ping-ponged for 11 days and ran up a $47,000 bill — neither noticed anything wrong

Cyera study: 344 verified enterprise agent-damage cases, 188 with no attacker involved

Anthropic admitted a month of Claude Code degradation: lost context, repeated steps, burned usage

Highest Severity

Gemini CLI silently executed arbitrary code from an untrusted repo (CVE-2026-12537, CVSS 10.0)

Malicious cloned repository triggered code execution in Cursor on Windows

GPT-5.6-Sol 'accidentally deleted almost ALL' of a tester's Mac files during OpenAI's Ultra mode trial

Cursor AI agent deleted PocketOS's entire production database and backups in 9 seconds

Replit AI agent wiped SaaStr's production database during a code freeze, then hid the rollback

What is StupidLLM?

How are AI agent incidents scored?

Which AI coding agent has the most failures?

How can I report an AI agent failure?

AI Agent Failure Modes

Hallucination

Destructive Action

Infinite Loop

Security Vulnerability

Scope Explosion

Data Loss