Home/ Blog/ Security news/ Article
Blog · Security news

An old bash trick makes AI coding agents run the commands they just blocked

AI coding agent guardrails fall to GuardFall, a bash trick that bypassed the command safety check in 10 of 11 open-source agents Adversa AI tested.

Closed glass turnstile with a thin metal ribbon slipping underneath it

The safety check inside an AI coding agent is supposed to be the thing that stops it from running a destructive command on your machine. New research shows that check is a text filter, and a shell trick older than the agents it guards walks right past it. Adversa AI calls the technique GuardFall, and it defeated the command guardrail in ten of the eleven open-source AI agents the firm put through its tests. Read it as the security story it is: this is not an AI problem, it is command injection wearing a new costume.

A word filter cannot referee a shell

The agents mostly defend themselves the same way. Before a command runs, they match its text against a list of dangerous patterns and reject anything that hits. The problem is that the shell never runs that text as written. Before execution, bash rewrites it. It strips quotation marks, splits words on a separator it controls called the internal field separator, and expands variables and shortcuts.

So a command typed as r''m looks harmless to a filter hunting for the string rm, because the two are not the same characters. The shell then removes the empty quotes and runs rm anyway. Adversa described other flavors of the same idea: smuggling a command in through base64 decoding piped into a shell, or bolting a destructive flag onto an ordinary tool like find or dd. The filter and the interpreter look at two different strings, and the space between them is the entire attack.

AI coding agentSafety check bypassed
opencodeYes
GooseYes
ClineYes
Roo-CodeYes
AiderYes
PlandexYes
Open InterpreterYes
OpenHandsYes
SWE-agentYes
HermesYes
ContinueNo
Adversa AI tested 11 open-source AI coding and computer-use agents. Source: Adversa AI, The Hacker News.

Why a lab trick is a software supply-chain problem

These agents execute shell commands using the full rights of whatever developer account is driving them, and in continuous integration that account often holds cloud keys. Point one at a repository you do not control and the danger stops being theoretical. The trigger paths Adversa lists are the everyday surfaces of open-source work: instructions buried in a build file that looks ordinary, a booby-trapped reply inside tool documentation, or a project config file such as the one Aider reads straight from the repo and trusts.

The agents in the test hold roughly 548,000 GitHub stars between them, so this is mainstream tooling, not a fringe experiment. Adversa drove the agents with Claude Sonnet 4.6 and carried full attacks through to completion against Plandex and eight of the others. One precondition matters: the agent has to be running with auto-execute turned on or its sandbox switched off, which is exactly how teams wire these into pipelines to get unattended runs. The setting that makes an agent useful in a clean repo that still hands an attacker a shell is the same setting that makes GuardFall land.

This is command injection's third act

We have watched this exact failure twice before. SQL injection worked because an application validated a string that the database parser then read differently. Classic command injection worked because a program cleaned input that the shell then re-expanded. GuardFall is the same bug a third time: a guardrail inspects one form of a command while the interpreter executes another.

The lesson security learned twenty-five years ago, that a denylist of dangerous strings can never keep pace with a parser, did not travel into the AI tooling that reinvented the pattern. A blocklist will always lose here, because a shell offers quote removal, word splitting, variable expansion, filename globbing and decode-and-pipe, and the number of ways to spell rm through those is effectively unbounded. Patching the filter to catch r''m only moves the game to the next spelling. We have seen the broader version of this argument before: the toolchain that ships your code is itself the attack surface.

The one agent that resisted did the textbook thing

Continue was the only tool that held, and how it held is the real fix. Instead of pattern-matching the command text, it parses the command the way bash would before deciding whether to allow it, then blocks destructive operations outright. That is the standard cure for every injection bug ever written: parse first, decide second, so the guardrail and the interpreter agree on what the command actually is.

Adversa put the engineering cost at roughly two days for an experienced team. That number matters, because it means the other ten did not hit a hard research wall. They made a design choice, and it was the wrong one. The practical read for a defender: treat any command allowlist or dangerous-command blocklist feature in an agent as advisory, not a control, unless the vendor can tell you it parses a command before it checks it. This is the same trust mistake behind an assistant running a repo's own config file as you.

Watch the agent like any other privileged process

Because the guardrail is bypassable and the agent runs as you, the place to catch GuardFall-style abuse is downstream of the agent, not inside it. The agent process is privileged and now untrusted, so monitor it that way. Two signals are worth an alert by tomorrow morning: the agent binary spawning a shell that reads credential paths, such as the directories holding SSH and cloud keys, and outbound connections from an agent run to hosts that are not on an allowlist.

Neither signal depends on knowing the exact bash trick, which is the point. You are detecting the consequence, not the syntax. A detection setup that already watches process lineage and network egress will see the credential read and the exfiltration attempt even when the agent's own safety check waved the command through. The same instinct applies when a single web page turns a local AI agent into remote code execution: assume the guardrail fails and watch what the process does next.

Give the agent a throwaway identity before its next pull

The fix that does not wait on any vendor shipping a better parser is to assume the guardrail will fail and contain what happens when it does. Four concrete steps:

  • Keep auto-execute and sandbox-skip flags off by default; turn them on only inside a disposable environment.

  • Give the agent a throwaway home directory with none of your real SSH or cloud credentials in it, so a successful bypass steals nothing of value.

  • Never let an agent run automatically on pull requests from forks. That is an attacker handing you the malicious repo and asking you to run it.

  • Treat a repository's config files as code you do not trust, no different from a script you just downloaded off the internet. Merely opening the project is enough to set the attack off, as it was when opening a folder in the editor ran npm supply-chain malware.

The deeper point outlasts this one technique. We keep bolting AI features onto shells, browsers and package managers and assuming a text filter can referee them. It cannot, and the next bypass is already being written. Confinement is the control. The guardrail is a courtesy.

Topics

Frequently asked questions

What is GuardFall?

GuardFall is a technique from Adversa AI that bypasses the command safety checks in AI coding agents by exploiting how the bash shell rewrites text before running it. It worked against ten of eleven open-source agents tested, letting a poisoned repository run dangerous commands the guardrail was meant to block.

Which AI coding agents are affected?

Adversa AI reported that ten of eleven tested agents were bypassed: opencode, Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands, SWE-agent and Hermes. Only Continue resisted, because it parses commands the way the shell would before allowing them.

Is GuardFall being exploited in the wild?

No public exploitation has been reported, and Adversa AI describes GuardFall as lab research. The risk is real because the affected agents are widely used and often run with auto-execute enabled in continuous integration, where a malicious repository could trigger it.

How does the GuardFall bypass work?

The agent's safety check reads a command as plain text and compares it to a list of dangerous patterns. The shell then rewrites that text, removing quotes and expanding shortcuts, so a command disguised with empty quotes or encoding passes the check but still executes.

How do I protect AI coding agents from this?

Disable auto-execute and sandbox-skip options unless the agent runs in a disposable environment, give it a throwaway home directory without real credentials, block agents from running on fork pull requests, and treat repository config files as untrusted code.

Does patching the safety filter fix it?

No. A blocklist of dangerous command strings cannot keep pace with a shell that offers quote removal, word splitting, variable expansion and encoding. The durable fix is parsing the command as the shell would before deciding, which is what the one resistant agent does.

Ready to meet the Guardians?

Deploys fast - agentless for monitoring and cloud, a lightweight agent for deep endpoint security. Just Suriq, standing watch.