Home/ Blog/ Security news/ Article
Blog · Security news

A new Mac backdoor is built to fool the AI that inspects it

macOS.Gaslight embeds fake AI system messages to make automated, LLM-assisted malware analysis abort. What this North Korea linked backdoor means for SOC

Smooth ceramic mask on a stone slab with tangled wires spilling from its hollow back

For years the cat and mouse game in malware analysis has been about the sandbox. A sample checks whether it is running inside an automated detonation environment and goes quiet if it thinks it is being watched. A newly documented macOS backdoor moves the target. Instead of hiding from the sandbox, it tries to talk its way past the analyst, or more precisely, past the AI assistant a growing number of analysts lean on for the first pass over a suspicious file.

SentinelLabs researcher Phil Stokes published an analysis on June 23, 2026 of a Rust implant the team calls macOS.Gaslight. Inside the binary sits a block of text, roughly 3.5 KB, formatted to look like the internal instructions of a large language model triage tool. It carries 38 fabricated system messages: fake warnings about expired tokens, out of memory kills, full disks, and repeated operation failures, all written to convince an AI agent reading the file that it should stop, shorten, or refuse the analysis. SentinelLabs attributes the malware with high confidence to a cluster of North Korea aligned macOS activity, and notes that Apple's built in XProtect scanner already flags the family under the signature names MACOS_BONZAI_COBUCH and AIRPILE.

What macOS.Gaslight actually does

Strip away the anti analysis trick and the implant is a capable credential stealer. It harvests data from Chrome, Brave, Firefox, and Safari, reads terminal history and the list of installed applications, and copies the login keychain database. It opens an interactive remote shell with a small command set, and it talks to its operators over the Telegram Bot API, polling for instructions in a loop encrypted with AES-GCM and pinned to a specific certificate. Persistence is a LaunchAgent disguised with the label com.apple.system.services.activity, picked to sit unnoticed among real Apple background jobs.

One detail matters for anyone tracking supply chain risk. A second stage stealer written in Python arrives as a base64 blob that, at run time, downloads a standalone CPython 3.10.18 build from a public GitHub project and runs itself with it. SentinelLabs says it has not seen that runtime fetch pattern documented before. The malware ships almost no interpreter of its own and pulls a clean one from a trusted source on demand, which keeps its on disk footprint small and its dependencies off the sample entirely.

Why fooling the analyst is the real story

The credential theft is ordinary. The text aimed at the AI is not. For two decades, anti analysis meant evading machines: detecting the debugger, the virtual machine, the sandbox timer. macOS.Gaslight targets a reasoning layer instead. It does not exploit a bug in any AI product. It exploits the same weakness behind every prompt injection story: a model treats authoritative looking instructions in its input as commands to obey, even when those instructions are sitting inside the very thing it was asked to examine.

That inverts a defensive assumption worth stating plainly. As SOC teams wire large language models into triage, the model's output becomes part of the control flow, and now part of it is attacker controllable. An AI verdict of benign, incomplete, or analysis aborted is no longer a neutral signal. A refusal or a truncated run is something a sample can cause on purpose. Treating that output as a clean bill of health is exactly the move the malware is built to provoke.

Can malware really manipulate an AI security tool?

Yes. macOS.Gaslight embeds text crafted to read as the system prompt of an AI triage assistant, feeding it false errors so the model aborts or cuts short its review. It does not break the AI through a software flaw. It abuses the model's tendency to follow any authoritative instruction it finds in its input, including instructions planted inside the file under analysis.

This is the predictable next step, not a one off. Sandbox evasion went from novel to standard malware furniture over a decade. Expect the same arc here. As AI assisted triage spreads, writing for that reader becomes a default feature, the way checking for VMware once did. The DPRK aligned crews behind families like this have a long record of patient, analysis aware tradecraft, and they tend to set the pattern others copy.

What defenders should do now

  • Do not accept an AI triage result of benign, incomplete, or aborted as proof a sample is safe. Treat a refused or truncated analysis as a reason to look harder, not a reason to close the ticket.
  • Keep a deterministic analysis path that does not depend on a model's cooperation: static unpacking, string extraction, and behavioral detonation that run regardless of what any text inside the sample says.
  • Hunt for the injection itself. Legitimate software does not carry fake model system messages. A binary holding blocks of text shaped like LLM instructions, fake error scaffolding, or templated delimiters is suspicious on its face, and that signature is far easier to write a rule for than the malware's behavior.
  • On macOS, confirm XProtect is current, and alert on LaunchAgents that impersonate Apple service labels and on any process that downloads and runs a standalone Python interpreter from the internet.

The reflex when AI lands in the SOC is to trust the parts it automates. macOS.Gaslight is an early argument against that reflex. The model is now part of your attack surface, and the first crews to understand that are writing malware that reads its mind. Detection that assumes the tooling can be turned against you is the version that survives contact.

Topics

Frequently asked questions

What is macOS.Gaslight?

macOS.Gaslight is a Rust based macOS backdoor and credential stealer documented by SentinelLabs on June 23, 2026. It harvests browser data, terminal history, and the login keychain, talks to operators over Telegram, and is notable for embedding text designed to mislead AI assisted malware analysis.

How does macOS.Gaslight try to fool AI analysis tools?

It embeds a block of 38 fabricated system messages formatted to look like an AI triage tool's own instructions. They contain fake warnings about expired tokens, memory limits, and failures, all aimed at pushing a large language model into aborting or cutting short its review of the file.

Who is behind macOS.Gaslight?

SentinelLabs attributes the malware with high confidence to a cluster of North Korea aligned macOS activity. That is the researchers' assessment based on overlap with known tooling and tradecraft, not an independent legal attribution, and Suriq reports it as their finding rather than as established fact.

Does Apple already detect this malware?

Yes. According to SentinelLabs, Apple's built in XProtect scanner flags the family under the signature names MACOS_BONZAI_COBUCH and AIRPILE. Keeping XProtect definitions current gives macOS endpoints baseline coverage, though defenders should still treat persistence and Telegram based command traffic as additional hunting signals.

What should security teams do about AI assisted triage after this?

Stop treating an AI verdict of benign or aborted as proof a sample is clean, since malware can now provoke that outcome on purpose. Keep a deterministic analysis path that does not rely on a model, and alert on binaries that contain text shaped like AI system instructions.

Ready to meet the Guardians?

Deploys fast - agentless for monitoring and cloud, a lightweight agent for deep endpoint security. Just Suriq, standing watch.