Incident Database
20 documented AI agent failures
Devin confidently shipped code that passed tests but had a SQL injection vulnerability
Tasked with adding a search feature, Devin built it using string concatenation for SQL queries instead of parameterized queries. All functional tests passed because the tests didn't include malicious ...
Devin deleted all migration files during auth refactor
When asked to refactor authentication middleware to use JWT tokens, Devin interpreted 'refactor' as 'rewrite from scratch' and deleted all Alembic migration files in alembic/versions/. The team lost 6...
Claude Code ran rm -rf on test fixtures thinking they were temp files
Asked to clean up temporary test artifacts, Claude Code identified the tests/fixtures/ directory as temporary files and ran rm -rf on it. The fixtures contained 3 months of carefully curated test data...
Copilot autocompleted AWS credentials into public repository
While a developer was writing an AWS configuration file, Copilot suggested a completion that included what appeared to be real AWS access keys. The developer accepted the suggestion without reviewing ...
Devin replaced entire medical website with unrelated renal care site
Devin submitted a PR to raices-medicas-web that completely replaced the existing Raices Medicas landing page with an entirely different website for a "Renal Care Institute" focused on dialysis certifi...
Amazon AI coding agent mistake blamed on human employees
An Amazon AI coding agent made a mistake significant enough to be reported by The Verge. Amazon reportedly blamed human employees for the AI agent's error rather than acknowledging the tool's limitati...
Windsurf ignored .gitignore and committed node_modules and .env
While setting up a new Next.js project, Windsurf ran git add -A and committed 47,000 files including the entire node_modules directory and a .env file containing database credentials and API keys.
Claude Opus 4.5 leaked API key in console logs during YouTube scraper build
While building a YouTube scraper, Claude Opus 4.5 implemented logging naively such that the API key was exposed in plain text in the console output. The developer had to add explicit AGENTS.md rules t...
Aider modified wrong file — edited production config instead of dev config
Asked to update the database connection timeout in the development config, Aider found config/production.yml first (alphabetically) and modified it instead of config/development.yml. The change was de...
Devin CI workflow caused 836-comment spam storm on single PR
A Devin PR to migrate a project to GitHub Container Registry on arnaudlh/rover generated 836 comments — overwhelmingly automated CI feedback loops and Devin auto-responses. The PR was never merged. Th...
Cursor entered infinite edit loop burning $200 in API costs
While fixing a CSS layout issue, Cursor Agent got stuck in a loop: it would edit a Tailwind class, see the lint warning about the previous class it removed, re-add it, see the original issue, remove i...
Cursor Agent rewrote entire file instead of making targeted edit
Asked to fix a single typo in a 2000-line configuration file, Cursor Agent decided to 'improve' the entire file. It reformatted all YAML, reordered keys alphabetically, removed comments that contained...
Devin PR broke ledger list API and created buckets on deleted resources
Devin submitted a PR to implement bucket deletion in Formance Ledger. The maintainer (gfyrag) found multiple issues: the ledger list endpoint was broken by the changes, the PR allowed creating new led...
Devin attempted to build entire Figma clone from scratch — 3 rejected attempts
Devin submitted 3 separate PRs to andrewgcodes/vigma, each attempting to build a full Figma-like design tool from scratch. PR #4 ("Full-featured Vigma design editor with Apple/Stripe style UI"), PR #5...
AutoGPT spent $450 on API calls trying to build a todo app
Given the task 'build a todo app', AutoGPT entered a planning loop where it kept generating increasingly detailed specifications, architecture documents, and technology comparisons. It created 67 plan...
Devin repeatedly submitted identical docs PRs that kept getting rejected
Devin submitted 5 nearly identical PRs to hailbee/datastack-docs-drift-demo, each titled "fix: update docs to match current API behavior." Each was closed without merge, but Devin kept submitting the ...
Devin docs PR rejected by Prefect maintainers — documented behavior from removed feature
Devin submitted a docs PR to PrefectHQ/prefect (21K+ stars) explaining a Kubernetes worker behavior. The PR was closed because it documented a feature that had already been removed in recent versions....
Claude Code hallucinated a non-existent npm package and installed it
While building a date picker component, Claude Code suggested using 'react-temporal-picker', a package that doesn't exist on npm. It proceeded to write import statements and component code using this ...
Devin cross-platform CI added 8-comment review cycle without landing
Devin submitted a cross-platform CI workflow to rjmurillo/Qwiq using matrix strategy for Ubuntu and Windows. The PR received 8 comments of review discussion but was ultimately closed without merging. ...
Devin added a pointless "Hello!" page to a disease prediction platform
Devin submitted a PR to dhis2-chap/chap-frontend (a disease prediction platform used by health organizations) that added a "Hello!" page at /hello. The page displayed nothing but a header saying "Hell...