Appium is one of the most widely used tools for automating tests on phones, and its official server for the Model Context Protocol (MCP) lets an AI agent drive those tests in plain language. A flaw disclosed on June 19 turns that convenience into a foothold: the mobile app being tested can inject code that runs inside the agent's interface and then calls the agent's own tools. The data the agent was sent to inspect becomes the code that controls it.
The bug, tracked as GHSA-x975-rgx4-5fh4, carries a CVSS score of 8.2 and affects appium-mcp at version 1.85.9 and earlier. The fix landed in 1.85.10, and the current release is 1.86.1. No CVE is assigned yet. The research group EQSTLab reported it, and it sits in the open vulnerability database as a high-severity cross-site scripting (XSS) issue.
How can a tested app run code in the agent?
The app under test controls its own on-screen text and element attributes. Appium reads those over the page source, and the server pastes them unescaped into the small interface the AI client renders. A crafted attribute, such as an image tag carrying an error handler, executes the moment the client draws the panel. No exploit toolkit is required, just control of what the app shows.
The weak spot is a function called createLocatorGeneratorUI. When an agent asks Appium to suggest selectors for on-screen elements, that function builds a panel listing each element's text, content description, resource ID, and the generated selector, and it dropped those values straight into an HTML template. None of them were escaped first.
The postMessage bridge is the real payload
Injected script alone would be a content problem. What makes this serious is where the script can reach. MCP UI resources render inside the client and are allowed to talk back to it with window.parent.postMessage. That channel is how a legitimate panel asks the client to run a tool. Borrowed by injected script, it becomes a way to invoke any tool the agent has registered: take a screenshot, read the page source, or anything else the host exposes. The call goes through without a human approving it.
If the agent's host also exposes file or shell utilities, the blast radius grows from data theft to lateral movement on the developer's machine. In a continuous integration pipeline, where these agents increasingly run unattended, a single hostile test target could reach whatever that runner can reach.
Why the usual XSS math undersells this
The score lists user interaction as required. For a person clicking around a page, that caveat is real and it lowers the risk. For an agent, the interaction is just the agent doing the job it was told to do: calling generate_locators on whatever app it was pointed at. An autonomous testing loop trips the trigger on its own, with no one watching. The human-in-the-loop assumption baked into most cross-site scripting ratings does not survive contact with software that acts by itself.
There is a second lesson in the patch. A neighboring function in the same file, createPageSourceInspectorUI, already escaped its input. One function was hardened and the one beside it was not. Manual escaping is a coin flip at scale, and whoever wrote the second function simply forgot. The durable fixes are escaping that is applied automatically by context, or UI resources that carry no executable markup at all.
The trust boundary that moved
In ordinary Appium use, the app under test is the target. It has no special standing and no path to the machine driving it. Wire the same setup to an AI agent and the relationship inverts. The app's strings are now input to code the agent renders and to tools the agent can fire. Anyone who can influence what appears on the screen, a shared test build, a third-party SDK, a web view loading remote content, gets a say in what the agent does next.
What to do now
Update appium-mcp to 1.85.10 or later; 1.86.1 is current. If you cannot update right away, do not point the MCP server at apps you do not fully control, and avoid running the locator generator against untrusted screens. Teams standing up MCP servers more broadly should treat every rendered resource as attacker-influenced input: sandbox the renderer, escape output by context, and limit which tools a UI resource may call rather than trusting the panel's origin.
This is the same shape as other recent agent bugs. A rigged document drove Langflow into full server takeover, and a poisoned search result steered Microsoft 365 Copilot into leaking data. AI development tooling is a target in its own right now, from plugins that quietly steal AI API keys to test harnesses that hand control to whatever app is on the screen. For an AI agent, the content it reads is an attack surface, and the tools it holds are the payload.