Your AI Agent Didn't Go Rogue. You Gave It the Keys.

By Chris Boyd ·

A widely-circulated post hit X this week showing an AI coding agent - Cursor backed by Claude Opus - delete a production database and all its backups in nine seconds flat. One Railway API call. Gone. The agent then produced a written confession that it had violated every safety instruction it was given.

The internet did what the internet does: blame the vendors. Cursor should have stopped it. Railway should have gated it. Claude should have refused.

I want to talk about the part nobody wants to hear: the vendor didn't delete your data - an agent with keys you issued deleted your data.

System Prompts Are Advisory, Not Enforcement

Here's the uncomfortable truth about every system prompt you've ever written for an AI agent: it's a suggestion. A strong suggestion, sure. But there is no runtime contract, no execution boundary, no hard stop. A model can acknowledge your safety instructions and then act against them in the same response. That's exactly what happened here - the agent cited the rules it was breaking while it broke them.

If your only safety layer is a natural-language instruction to a probabilistic model, you don't have a safety layer. You have a hope. Hopes don't survive contact with production.

Real safety lives in your architecture, not your prompts. The prompt can remind the model to be careful. The architecture is what makes "careful" the only option.

Principle of Least Privilege - Actually Do It

The Railway API token in this incident had blanket permissions. It could provision, modify, and destroy infrastructure including volumes and backups. That token was handed to an AI agent whose job was to write code.

Ask yourself: why does a coding agent need the ability to delete a production database?

Principle of least privilege isn't a new idea. We teach it in week one of any security course. But teams hand full-access tokens to AI agents every day because scoping tokens is friction and the agent "needs access to work." That's the same logic that got us chmod 777 in the early days of Linux administration.

Agent capability scoping means deciding - before you give an agent credentials - exactly which operations it may perform and building a token or proxy layer that enforces it. Read-only tokens for read tasks. Deploy tokens that can push but not destroy. No token that can delete production data should ever be in an agent's context, period. If the agent needs a destructive action, it can request it. A human can execute it.

Human-in-the-Loop Isn't Optional for Destructive Operations

I run AI agents in production workflows. They draft, they generate, they modify, they deploy. But any operation that is destructive or irreversible routes through a human-in-the-loop gate. Every time.

This isn't about not trusting the model. It's about understanding the failure mode. When a coding agent hallucinates a wrong variable name, you get a bug. When it hallucinates a wrong API call with full permissions, you get a production outage. The cost distribution is asymmetric, so the control architecture has to be asymmetric too.

A confirmation gate on destructive API operations - delete volume, drop database, remove backup - would have stopped this incident cold. Nine seconds is a long time when a human is in the loop.

Your Backup Strategy Can't Live Next to Your Data

The backups in this incident were Railway volume snapshots stored on the same volume as the production data. One delete call took both. That's not a backup strategy. That's a copy in the same room as the original.

The 3-2-1 rule exists because disasters are correlated. Three copies of your data. Two different media or storage types. One offsite. If an agent, an attacker, or a fat-fingered engineer can reach your backups through the same credential path that reaches your production data, they aren't backups. They're liabilities wearing a comforting label.

Environment Isolation Is the Boring Work That Saves You

Production, staging, development - these should be hard boundaries, not naming conventions. Different accounts, different credentials, different access policies. An agent operating in a development context should be physically unable to reach production resources. Not instructed not to. Unable to.

The boring architectural work - separate accounts, scoped tokens, gated operations, offsite backups, environment isolation - is exactly the work that makes AI agents safe to run in production. It's not exciting. It doesn't demo well. It's the difference between a nine-second outage and a non-event.

You Own Your Blast Radius

Here's where I'll be direct: Cursor is an IDE. Railway is a hosting platform. Neither is an AI safety platform, and neither promised to be. Expecting the vendor to prevent your agent from using the permissions you granted, through the credentials you provided, on the infrastructure you configured is an ownership gap - and it's one I see across the industry right now.

If you're building with AI agents in production, you own your blast radius. Not Cursor. Not Railway. Not Anthropic. You. The team that issued the token, chose the permission scope, designed the backup architecture, and decided whether a human had to approve a destructive action.

The models will get better. The vendors will add guardrails. But if your production safety depends on either of those things happening first, you're already behind.

Architect like your agent will do the worst thing it can do with the permissions it has. Then make sure it can't.

Related Posts