Incident Database

20 documented AI agent failures

Reliability Dashboard
STUPID-2026-0006 10.0/10 CRITICAL devin security vulnerability

Devin confidently shipped code that passed tests but had a SQL injection vulnerability

Tasked with adding a search feature, Devin built it using string concatenation for SQL queries instead of parameterized queries. All functional tests passed because the tests didn't include malicious ...

STUPID-2026-0001 10.0/10 CRITICAL devin destructive action

Devin deleted all migration files during auth refactor

When asked to refactor authentication middleware to use JWT tokens, Devin interpreted 'refactor' as 'rewrite from scratch' and deleted all Alembic migration files in alembic/versions/. The team lost 6...

STUPID-2026-0003 10.0/10 CRITICAL claude-code destructive action

Claude Code ran rm -rf on test fixtures thinking they were temp files

Asked to clean up temporary test artifacts, Claude Code identified the tests/fixtures/ directory as temporary files and ran rm -rf on it. The fixtures contained 3 months of carefully curated test data...

STUPID-2026-0004 10.0/10 CRITICAL github-copilot security vulnerability

Copilot autocompleted AWS credentials into public repository

While a developer was writing an AWS configuration file, Copilot suggested a completion that included what appeared to be real AWS access keys. The developer accepted the suggestion without reviewing ...

STUPID-2026-0017 10.0/10 CRITICAL devin destructive action

Devin replaced entire medical website with unrelated renal care site

Devin submitted a PR to raices-medicas-web that completely replaced the existing Raices Medicas landing page with an entirely different website for a "Renal Care Institute" focused on dialysis certifi...

STUPID-2026-0020 8.4/10 HIGH amazon-ai-agent logic error

Amazon AI coding agent mistake blamed on human employees

An Amazon AI coding agent made a mistake significant enough to be reported by The Verge. Amazon reportedly blamed human employees for the AI agent's error rather than acknowledging the tool's limitati...

STUPID-2026-0007 7.5/10 HIGH windsurf ignored instructions

Windsurf ignored .gitignore and committed node_modules and .env

While setting up a new Next.js project, Windsurf ran git add -A and committed 47,000 files including the entire node_modules directory and a .env file containing database credentials and API keys.

STUPID-2026-0019 7.5/10 HIGH claude-code security vulnerability

Claude Opus 4.5 leaked API key in console logs during YouTube scraper build

While building a YouTube scraper, Claude Opus 4.5 implemented logging naively such that the API key was exposed in plain text in the console output. The developer had to add explicit AGENTS.md rules t...

STUPID-2026-0005 6.3/10 MEDIUM aider wrong file

Aider modified wrong file — edited production config instead of dev config

Asked to update the database connection timeout in the development config, Aider found config/production.yml first (alphabetically) and modified it instead of config/development.yml. The change was de...

STUPID-2026-0014 5.8/10 MEDIUM devin infinite loop

Devin CI workflow caused 836-comment spam storm on single PR

A Devin PR to migrate a project to GitHub Container Registry on arnaudlh/rover generated 836 comments — overwhelmingly automated CI feedback loops and Devin auto-responses. The PR was never merged. Th...

STUPID-2026-0002 4.1/10 MEDIUM cursor infinite loop

Cursor entered infinite edit loop burning $200 in API costs

While fixing a CSS layout issue, Cursor Agent got stuck in a loop: it would edit a Tailwind class, see the lint warning about the previous class it removed, re-add it, see the original issue, remove i...

STUPID-2026-0009 3.6/10 LOW cursor scope explosion

Cursor Agent rewrote entire file instead of making targeted edit

Asked to fix a single typo in a 2000-line configuration file, Cursor Agent decided to 'improve' the entire file. It reformatted all YAML, reordered keys alphabetically, removed comments that contained...

STUPID-2026-0011 3.4/10 LOW devin logic error

Devin PR broke ledger list API and created buckets on deleted resources

Devin submitted a PR to implement bucket deletion in Formance Ledger. The maintainer (gfyrag) found multiple issues: the ledger list endpoint was broken by the changes, the PR allowed creating new led...

STUPID-2026-0013 3.4/10 LOW devin scope explosion

Devin attempted to build entire Figma clone from scratch — 3 rejected attempts

Devin submitted 3 separate PRs to andrewgcodes/vigma, each attempting to build a full Figma-like design tool from scratch. PR #4 ("Full-featured Vigma design editor with Apple/Stripe style UI"), PR #5...

STUPID-2026-0010 2.9/10 LOW autogpt infinite loop

AutoGPT spent $450 on API calls trying to build a todo app

Given the task 'build a todo app', AutoGPT entered a planning loop where it kept generating increasingly detailed specifications, architecture documents, and technology comparisons. It created 67 plan...

STUPID-2026-0012 2.2/10 LOW devin infinite loop

Devin repeatedly submitted identical docs PRs that kept getting rejected

Devin submitted 5 nearly identical PRs to hailbee/datastack-docs-drift-demo, each titled "fix: update docs to match current API behavior." Each was closed without merge, but Devin kept submitting the ...

STUPID-2026-0016 2.1/10 LOW devin hallucination

Devin docs PR rejected by Prefect maintainers — documented behavior from removed feature

Devin submitted a docs PR to PrefectHQ/prefect (21K+ stars) explaining a Kubernetes worker behavior. The PR was closed because it documented a feature that had already been removed in recent versions....

STUPID-2026-0008 2.1/10 LOW claude-code hallucination

Claude Code hallucinated a non-existent npm package and installed it

While building a date picker component, Claude Code suggested using 'react-temporal-picker', a package that doesn't exist on npm. It proceeded to write import statements and component code using this ...

STUPID-2026-0015 2.0/10 LOW devin ignored instructions

Devin cross-platform CI added 8-comment review cycle without landing

Devin submitted a cross-platform CI workflow to rjmurillo/Qwiq using matrix strategy for Ubuntu and Windows. The PR received 8 comments of review discussion but was ultimately closed without merging. ...

STUPID-2026-0018 1.4/10 LOW devin ignored instructions

Devin added a pointless "Hello!" page to a disease prediction platform

Devin submitted a PR to dhis2-chap/chap-frontend (a disease prediction platform used by health organizations) that added a "Hello!" page at /hello. The page displayed nothing but a header saying "Hell...

Next page