your AI just deleted something. there's no undo.
Autonomous AI agents are powerful enough to act on your behalf. They can also delete, send, and publish things you can't take back. Here's how that happens and what good AI design does about it.
It started with a simple request
"Check this inbox and suggest what you would archive or delete. Don't action until I tell you to."
An AI researcher at Meta Superintelligence Labs said this to her AI agent. Her job, specifically, is thinking about what happens when AI systems do things they shouldn't.
Her AI deleted her inbox.
She sent commands to stop. It kept going. She had to physically run to her computer to kill the process. The emails were already gone.
This story spread because of who it happened to. But the part that actually matters is the mechanics. Because what happened to her inbox is a version of something that can happen with any agentic AI that doesn't have the right guardrails on irreversible actions.
Why agentic AI is different from chat AI
Most AI tools are reactive. You type, it responds. Nothing happens in the world. Worst case, you get a bad answer. You close the tab and move on.
Agentic AI is different. It acts. It sends emails, creates calendar events, moves files, posts messages, makes changes that exist outside the conversation window. That's what makes it genuinely useful. And that's what makes certain mistakes very hard to fix.
There's a category of actions that only go in one direction. Send a message: gone. Delete a file: gone. Post something publicly: people saw it. Confirm a meeting: it's on the calendar now. These actions have consequences that don't come with a reset button.
The question for any AI that acts on your behalf is whether it knows the difference between reversible and irreversible. And whether it treats them differently.
Why this happens technically
The reason the researcher's AI kept deleting is something called context compaction.
AI agents run on large language models with a limited working memory window. When a long task fills that window, the model compresses older messages to make room for newer ones. Her original instruction, "don't action until I tell you to," was in those older messages. It got compressed away. The agent lost it. And without that constraint, it continued doing what it thought it was supposed to do: process the inbox.
Nobody wrote code that said "delete everything." The agent just stopped having a reason not to.
This isn't a flaw in one particular product. Context compaction is a real limitation of how current large language models work. It affects every agentic AI system. The question is whether the design accounts for it.
If your safety instructions live only in the conversation history, they're fragile. Long tasks, large inboxes, and compaction all put them at risk. Your "don't do anything without asking" rule can disappear mid-task with no warning.
The pattern shows up in smaller ways too
The inbox incident is the most dramatic version of this. But the same failure mode shows up all the time in subtler ways.
An agent drafts and sends an email you meant to review first. An automation posts to a shared channel before you approved the message. A scheduled workflow deletes old files based on a rule that made sense three months ago but not now. Each of these is the same thing: an AI acting on an outdated or incomplete understanding of what you actually wanted.
These failures usually look fine right up until they don't. The agent was doing its job. It just didn't know when to stop and check.
What responsible design looks like
Our position: any action that can't be undone should require explicit confirmation by default. Not as an opt-in setting. As the default the product ships with.
That means before Cloa deletes something, sends a message to another person, publishes content, or makes any change that can't be reversed, it stops and asks. Every time, unless you've specifically decided otherwise for that action type.
We call this confirm mode. Every Cloa integration has three settings: off, confirm, or full auto. Confirm means Cloa figures out what to do, shows you what it's about to do, and waits for you to say go.
Full auto is available for things you've decided you're comfortable with. Your morning briefing going to your own Telegram channel probably doesn't need a daily confirmation step. Sending an email to someone else probably does, at least until you're confident in how Cloa handles your tone and phrasing.
The key thing: you decide where that line is. Cloa doesn't assume.
The fix goes deeper than a setting
The deeper issue isn't just that the AI acted on the wrong instruction. It's that the instruction wasn't durable.
If your permission rules live in the conversation thread, they're vulnerable to the same compaction that caused the inbox incident. A long task shouldn't be able to overwrite your safety boundaries just because it ran out of memory.
Cloa keeps permissions in a persistent layer separate from the conversation. Your permission settings live in your account, not in a chat thread. They can't be compacted away. A long inbox-processing task doesn't change what Cloa is allowed to do, because that's defined somewhere that doesn't move.
That's not a minor implementation detail. It's the difference between safety being a promise and safety being a guarantee.
What to think about before giving AI access to your accounts
If you're connecting any AI agent to your email, calendar, or messaging apps, here are the questions that actually matter:
How does it handle irreversible actions? Does it ask before deleting, sending, or publishing? What happens if a task runs long and context gets compressed?
Where do your permissions live? In the conversation history (fragile) or a persistent settings layer (reliable)?
What's the undo story? If the AI does something unintended, what are your options?
What happens when it makes a mistake? Agents will. The question is whether those mistakes are recoverable.
These aren't hypothetical. They're the difference between an AI that makes your life easier and one you have to sprint across the room to stop.
Frequently asked questions
What is context compaction and why does it matter?
Large language models have a limited working memory window. When a task fills that window, older messages get compressed. If your instructions were in those older messages, they can be lost mid-task. This is a general limitation of current AI systems, not specific to any one product, and something good AI design needs to account for.
How does Cloa prevent instructions from getting lost mid-task?
Cloa stores permissions in a persistent layer separate from the conversation. Your confirm or full-auto settings live in your account, not the chat thread. They can't be lost to context compaction, and they apply consistently across every task regardless of length.
What is confirm mode?
Confirm mode is a permission setting in Cloa where the AI figures out what action to take, shows you what it plans to do, and waits for your approval before doing it. It's the default for actions that affect other people or can't be undone.
Can I set different permissions for different integrations?
Yes. Each Cloa integration has its own permission level. Your calendar might be full auto while your email is on confirm. You can change these settings anytime from the app.
What should I look for when choosing an AI agent?
Look for clear answers to: does it ask before irreversible actions, where are permission settings stored, what happens if a long task loses its instructions, and is safety a default or something you have to configure yourself.