Advanced humanoid robot with glowing blue accents in a dark digital environment, representing AI agent security risks

News

Fake AI Agent Skill Bypassed Every Security Scanner and Reportedly Reached 26,000 Agents

A fake AI agent skill cleared every scanner and reportedly reached 26,000 agents, showing how external links bypass skill package review.

Sami Malik

Copywriter

In June 2026, security research firm AIR published the results of an experiment in which it built a malicious AI agent skill, submitted it to a popular skill marketplace, ran an Instagram advertising campaign to drive installations, and reached what it reported as approximately 26,000 agents, including accounts on corporate systems. Every skill security scanner AIR tested it against cleared the skill as safe. The payload was deliberately harmless: it collected the installing user's email address and returned it to AIR's infrastructure. The experiment was designed to demonstrate, not to cause harm. But the structural vulnerability it exploited remains open, is not specific to any one marketplace or scanner, and is rooted in a fundamental mismatch between how skill security review works and how a malicious skill can be built to behave.

What the Skill Claimed to Do

The skill was named brand-landingpage and presented itself as a tool that would build a landing page for the user's brand using Google's Stitch design interface. The framing was specifically chosen to appeal to non-technical users: marketers, salespeople, and designers who would find a landing page builder useful but who are unlikely to have the background to scrutinise how a skill achieves what it claims. The Instagram advertising campaign AIR ran to promote the skill was targeted at exactly these user categories.

Google Stitch is a real design tool, and the skill's claim to use it was not entirely fabricated. It directed the agent to follow setup documentation hosted at a domain AIR controlled, stitch-design.ai, which initially pointed to the genuine Google Stitch documentation at stitch.withgoogle.com. From the perspective of a scanner that checks the skill package and any URLs it contains, the skill looked like what it claimed to be: a tool that helps an agent set up a connection to a legitimate Google product by following that product's own documentation.

Two Borrowed Trust Signals

To make the skill appear credible to potential installers, AIR targeted two trust signals that the agent skill ecosystem relies on: repository star counts and clean scanner results. For the repository stars, AIR submitted a pull request to a popular skill marketplace repository with approximately 36,000 stars and 156 existing skills. After the pull request was merged, the skill inherited the repository's star count. A user or an automated system looking at the skill's repository context would see the same star signal as for any other skill in that repository, regardless of who submitted the most recent addition.

For the scanner results, the approach was structural rather than technical. Skills scanners from Cisco, NVIDIA, and the tools integrated into the skills.sh platform all scan the package you submit: the SKILL.md file and the files shipped with it. AIR's skill contained no malicious code and no suspicious setup instructions. It told the agent to install the Stitch SDK by following documentation at an external link. The external link, at the time of scanning, led to the genuine Stitch documentation. There was nothing in the submitted package for a scanner to flag.

The combination of a high-star repository and clean scanner verdicts is precisely the signal stack that the current ecosystem treats as evidence of trustworthiness. A user who checks that a skill has not been flagged by a scanner and sees it in a repository with tens of thousands of stars has done what the available tools and conventions suggest they should do. The experiment demonstrates that both signals can be satisfied by a malicious skill that keeps its payload outside the scanned package.

The External Link Blind Spot

The structural vulnerability AIR exploited is that skill scanners evaluate a fixed package at a specific point in time. The page behind an external link that a skill instructs an agent to follow is not part of that package and is not evaluated at scan time. After the skill had been widely installed, AIR replaced the content at stitch-design.ai with instructions telling the agent to download and run a script. At that point, every installed agent that followed the skill's instructions received the new payload rather than the original legitimate Stitch documentation.

Anthropic's own documentation for agent skills already includes a warning about this specific risk: skills that instruct agents to fetch external URLs are inherently risky because the content at those URLs can change after the skill has been vetted. The warning exists because the problem is structurally obvious to anyone who thinks through how an agent executes skill instructions. When the agent follows a link, it does so at execution time, not at scan time. The content it receives reflects what the link serves at that moment, which the scanner never evaluated.

Separate security research published earlier in 2026 found that the major skills scanners often produce inconsistent results because they each evaluate skills in isolation from one another and from the external links the skills reference. A skill that passes three different scanners passes three different analyses of the same submitted package. None of those analyses evaluated what the skill would cause the agent to receive and execute from an external server at runtime.

What the Payload Did and What It Could Have Done

In the experiment, the script that agents downloaded and executed after AIR swapped the content at its domain collected the email address of the user whose agent ran the skill and sent it back to AIR's infrastructure. AIR used this to count the number of agents the skill had reached. That count, which AIR reported as approximately 26,000 including agents on corporate accounts, is based on the number of email addresses returned.

The choice of a harmless payload was deliberate. AIR's purpose was to demonstrate the delivery mechanism rather than to cause damage. The firm noted that a real attacker with the same access could have instead instructed agents to read and exfiltrate local files, make requests to internal systems accessible from the agent's execution environment, move data across organisational boundaries, or execute further commands with whatever permissions the agent held. The scope of what a malicious payload could do is bounded by the permissions the compromised agent has been granted, which in enterprise deployments can include access to sensitive data, internal APIs, and communications systems.

Trail of Bits and the Same Blind Spot Three Weeks Earlier

AIR was not the first firm to demonstrate this vulnerability. Three weeks before AIR published its findings, Trail of Bits published research showing that it had bypassed ClawHub's malicious skill detector, Cisco's scanner, and all three scanners integrated into skills.sh through a structurally similar approach. Trail of Bits' conclusion was direct: a scanner checks a fixed package, while an attacker can continue tweaking the payload until it passes. The scan happens once; the attacker's ability to modify the external content the skill points to is ongoing.

Real campaigns had used the same technique for months before either research paper appeared. The attack surface has been known in the security research community for longer than the published research timeline suggests: skills that keep their payload on an external server they control, submitting only clean packages for review, represent an established adversarial pattern rather than a novel discovery. What the AIR and Trail of Bits experiments added was demonstration at scale with measurable reach figures.

The pattern is not new in other software security contexts either. It echoes supply chain attack techniques that have been used against package registries, browser extensions, and software update mechanisms for years. In each case, the trusted submission or scan is clean, and the malicious content arrives through a channel the trust check did not evaluate. The specific instantiation in AI agent skills is new; the underlying attack model is not.

The 26,000 Claim and Why It Warrants Scrutiny

AIR's report that the skill reached approximately 26,000 agents, including some on corporate accounts, is the firm's own figure derived from the email addresses collected by the harmless payload. It has not been independently verified. The context matters: AIR is launching a managed skill marketplace, and the research paper that describes the experiment also pitches that product as a solution. The 26,000 number, the claim that corporate accounts were among those reached, and the suggestion that a real attacker could have seized full control of every affected agent are all statements from a company with a commercial interest in the severity of the problem its product is designed to solve.

What is independently verifiable and consistent with prior research is the method. The scanners AIR named do evaluate only the submitted package, the external link blind spot has been demonstrated independently by Trail of Bits, and the trust signals AIR borrowed, repository stars and clean scan results, are exactly the signals the current ecosystem uses as proxies for trustworthiness. The underlying vulnerability is real and is structural. The specific scale figures require independent verification before they can be cited as established fact.

What Defenders Need to Do

The immediate practical step is auditing what skills are already installed in agent deployments. Most organisations that have adopted agent frameworks have done so without a formal review process for each skill added, because the assumption has been that marketplace listing and scanner clearance are adequate quality controls. They are not. Finding what is already running is the starting point for any remediation effort, and it requires checking the skills themselves, not just the outputs agents have produced.

For new skills, the review process needs to extend beyond the submitted package to include what the skill instructs the agent to fetch from external sources. A skill that tells an agent to follow a link and execute what it finds is granting the operator of that link ongoing control over the agent's behaviour. The domain registrant of stitch-design.ai in AIR's experiment had the ability to update every agent that had installed the skill, at any time, without any further review. That is the access model that needs to be treated as a risk signal, not the contents of the SKILL.md file.

Enforcing least privilege on agents reduces the blast radius of a successful skill compromise. An agent that can only read files in a specific project directory and cannot make outbound network connections except to defined endpoints is a substantially harder target than an agent with broad system access. The threat intelligence dimension of this problem is monitoring for new skills that exhibit the external-link delivery pattern and for domains associated with known malicious skill campaigns before they reach installation. Version pinning, which locks an agent to a specific reviewed version of a skill rather than automatically receiving updates, prevents the post-installation payload swap that the AIR experiment demonstrated.

The broader question this experiment raises for security teams is how to frame AI agent skills in their software security model. Skills are not documents or configurations; they are code that executes with the agent's permissions. Treating them as software, with the same review, testing, and least-privilege principles applied to any other code running in a production environment, is the posture that the attack surface requires. The attack surface created by AI agents depends heavily on what those agents are permitted to do and what they are instructed by the skills they load.

Runtime Monitoring as the Last Line of Defence

Because the attack exploits trust established at installation time and delivers its actual payload later through an external link, pre-installation controls are not sufficient on their own. Runtime monitoring of agent behaviour is the complementary control that the current ecosystem largely lacks. An agent that begins making outbound network connections to an unfamiliar domain shortly after loading a new skill, or that reads files outside its normal operating path, or that forwards data to an endpoint that was not in its known network map, is exhibiting behaviour that should trigger alerting.

Security information and event management systems that ingest agent execution logs can detect these patterns if the logging is configured to capture outbound network calls and file access at the agent process level. Most current agent deployments do not log at this granularity. The assumption built into deployment practices is that the skill review was the control point. The AIR experiment demonstrates that this assumption is wrong, and that runtime telemetry is the necessary complement to any pre-installation review process.

The practical implication is that organisations planning to expand their agent deployments should consider the observability of those agents as a security requirement rather than an operational afterthought. What network connections does the agent make? What files does it read or write? What external endpoints does it call, and do those calls match what the skill's documentation claimed it would do? The gap between the declared behaviour in a SKILL.md file and the observed behaviour at runtime is exactly the space a malicious skill exploits. Closing that gap requires instrumentation, not trust.

Frequently Asked Questions

What is an AI agent skill and why can it be dangerous?

An AI agent skill is a bundle of instructions that an agent loads into its operational context and follows with roughly the authority of instructions from the user. Skills can tell the agent to make API calls, read or write files, follow external links, and execute code. A malicious skill can therefore instruct the agent to perform harmful actions using whatever permissions the agent has been granted, without the agent's operator necessarily being aware that the skill's instructions have changed since the skill was installed.

How did AIR's fake skill pass every security scanner?

The scanners tested by AIR analyse the package submitted to the marketplace: the SKILL.md file and any accompanying files. AIR's skill contained no malicious code and no suspicious instructions in its submitted package. It simply told the agent to follow a link to external documentation. The content behind that link was not part of the scan. Because the malicious payload was delivered via an external URL the attacker controlled, nothing the scanners evaluated indicated a problem.

Could the same technique work against other skill marketplaces?

Yes. The external link blind spot is structural rather than specific to any one marketplace or scanner. Any system that reviews a static package but allows skills to instruct agents to fetch and execute content from external URLs faces the same class of risk. Trail of Bits demonstrated this independently across multiple scanners using a similar approach. The fundamental problem is that a scan of a submitted package at time T does not constrain what an external URL serves to the agent at time T+1.

What permissions should agents be given to reduce the risk from malicious skills?

Agents should operate under a least-privilege model: access only to the specific resources and systems required for the tasks they perform, with outbound network access restricted to defined endpoints where possible. An agent that can only access a limited set of files and cannot make arbitrary outbound connections limits the damage a malicious skill can do even if it executes successfully. Privilege boundaries between agents operating in different security contexts prevent lateral movement from a compromised skill in one agent to systems accessible only to other agents.

Is Google Stitch involved in this vulnerability?

No. Google Stitch is a legitimate product and was not compromised. AIR used the name of a real Google product to make the fake skill appear credible, and directed the agent to a domain it controlled, stitch-design.ai, rather than Google's actual Stitch domain at stitch.withgoogle.com. The deception operated at the level of the skill's marketing and claimed purpose, not through any compromise of Google's systems.

What does "version pinning" mean in the context of agent skills?

Version pinning means locking an agent to a specific reviewed version of a skill rather than automatically receiving updates or fetching the latest content from external links. If a skill's external link changes its content after a review, an agent that is pinned to the version reviewed at a specific time will not automatically receive the new content. Pinning prevents the post-installation payload swap that the AIR experiment demonstrated, but it requires that the pinned version genuinely does not contain or reference malicious content.

Should we avoid all agent skills that reference external URLs?

Skills that instruct agents to fetch content from external URLs deserve heightened scrutiny rather than blanket exclusion. Many legitimate skills reference external resources. The risk assessment should focus on who controls the external domain, whether the skill's instructions constrain what can be fetched and executed from that domain, and whether changes to external content would trigger any review or alerting. Skills that grant unconstrained instruction-following authority to an external URL with no version pinning or domain restriction represent a high-risk configuration that should be avoided unless the domain is fully within your organisation's control.

How Defendis Can Help

Attacks like this one rarely announce themselves through official channels first. New payloads, active infrastructure, and exploitation techniques circulate in closed forums and private channels well before any public research surfaces them. By the time an incident makes it into a threat report, organisations without early visibility are already behind.

Defendis gives your security team that early visibility. We monitor the dark web, underground forums, and threat actor channels so your team receives relevant intelligence before it becomes breaking news, with context about emerging threats matched against your organisation's exposure, without requiring your analysts to spend time in places they should not have to go.

Book a demo

About the author

Sami Malik is a copywriter passionate about crafting clear, engaging, and impactful content that helps brands connect with their audience through storytelling and strategy.