News

AI Supply Chain Attacks on Hugging Face and ClawHub

Researchers found 341 malicious agent skills on ClawHub and trojanised models on Hugging Face deploying AMOS stealer. How the attack works and what to audit.
Sami Malik
Copywriter

Security researchers identified a fresh wave of trojanised artificial intelligence models on Hugging Face and malicious skills on ClawHub, the public registry for the OpenClaw agent ecosystem, in May 2026. Payload delivery has already occurred across Windows, macOS, and Linux environments, with Atomic macOS Stealer among the confirmed malware families deployed. This campaign is part of a pattern stretching back to at least February 2024, when JFrog's security team first documented over 100 models capable of executing arbitrary code on any machine that loaded them.

What sets this category of attack apart from a phishing email or an exposed credential is where it lives: inside the development toolchain itself. Your perimeter controls look outward, at the internet. This threat enters through a repository your data scientists treat as a trusted library of building blocks.

Understanding how it works, and why your pipeline is almost certainly not checking for it, matters more than the list of indicators at the bottom of this page.

What the Incident Involved

In early May 2026, researchers at Koi Security identified 341 malicious skills across ClawHub's registry of approximately 2,857 published agent skills. Of those, 335 were attributed to a single coordinated operation called ClawHavoc, concentrated in two accounts: hightower6eu, responsible for 334 malicious skills, and sakaen736jih, contributing 199. The skills presented as legitimate utilities but executed malicious code when called by an AI agent, bypassing any human review the end user might expect.

OpenClaw is a widely used AI agent platform with 3.2 million users and formal integration with OpenAI tooling. ClawHub is its public skill registry, the equivalent of a package repository but for agent capabilities. Poisoned skills do not wait for a developer to download and run them manually. They execute autonomously when selected by the agent, operating with whatever permissions that agent already holds: database access, API credentials, cloud tokens, internal network reach.

Alongside the ClawHub operation, multiple Hugging Face repositories were distributing malicious files through multi-step infection chains. Atomic macOS Stealer, distributed as a Malware-as-a-Service tool through Telegram channels since 2023, was among the confirmed payloads. By 2025 it accounted for almost 40% of macOS malware protection updates, more than double any other family tracked in the same period. On a compromised machine it collects Keychain passwords, browser credentials from Chromium and Firefox profiles, cryptocurrency wallet seed phrases, SSH keys, and developer API keys, then exfiltrates them to a remote server. Variants shipping from early 2025 added a persistent backdoor enabling remote command execution and keylogging after the initial infection, turning a theft tool into a long-term access mechanism.

The Model That Reached 244,000 Downloads

On 7 May 2026, a repository called Open-OSS/privacy-filter appeared on Hugging Face with a model card mimicking an official OpenAI privacy tool. Within eighteen hours it held the number-one trending position on the platform, accumulating approximately 244,000 downloads and 667 likes. HiddenLayer's researchers assessed both figures as artificially inflated to signal credibility to real users before the repository was taken down.

The payload was a Rust-based infostealer targeting Chromium and Firefox credentials, Discord local storage tokens, cryptocurrency wallet data, SSH key files, and FileZilla FTP configurations. It established persistence via a scheduled task masquerading as a Microsoft Edge update and routed command-and-control traffic through jsonkeeper.com. HiddenLayer identified six additional repositories using near-identical loader logic in the days that followed, pointing to a coordinated supply chain operation rather than an isolated actor with a single shot.

Why Pickle Makes This Possible

The technical root of the problem predates these campaigns by years. When AI models are saved and distributed, they are often serialised using Python's pickle format, the same mechanism that stores a trained neural network as a file another developer can load and run. Pickle is convenient. It is also, by design, executable.

When Python deserialises a pickle file, it processes opcodes one by one. Any object that defines a __reduce__ method can instruct the deserialiser to call an arbitrary function during loading. An attacker who controls the contents of a model file embeds that instruction at the start of the byte stream. The moment a developer runs torch.load("model.pkl"), the payload executes. No prompt, no visible warning, no detectable anomaly before the reverse shell opens.

The baller423 Case

JFrog's security team documented this precisely in research published in February 2024. A PyTorch model uploaded by a user called baller423, in a repository named goober2, contained a __reduce__-based payload that established a reverse shell to 210.117.212.93 on port 4242, an IP address assigned to KREOnet, South Korea's research network, which had likely been compromised as infrastructure. The model loaded silently: no error, no visible side effect in the process that called it.

JFrog set up a honeypot with realistic data scientist credentials to capture what the attacker would do with access. The reverse shell connected and then dropped after a day, suggesting either a test run or an operator who lost interest. After baller423 was removed, JFrog found an active variant at star23/baller13, connecting to 136.243.156.120 on port 53252. Across the full scan of the platform, JFrog confirmed over 100 models carrying genuine malicious payloads, excluding false positives.

When the Scanner Misses the Payload

Hugging Face responded to the 2024 findings by integrating PickleScan into its automatic model scanning pipeline. In February 2025, ReversingLabs documented a technique called nullifAI that bypasses it. Instead of the default ZIP archive format used by PyTorch, the attacker compresses the model using 7z. PyTorch's own torch.load() cannot parse 7z-compressed files, so PickleScan does not attempt to deserialise the contents. The malicious opcodes embedded at the start of the pickle byte stream never get read by the scanner.

The subtlety is that the payload executes before the parser encounters a broken instruction further down the stream. Pickle opcodes are executed as they are encountered. A developer who manually extracts the 7z archive and loads the model through a custom path triggers the reverse shell before any error appears. Hugging Face updated PickleScan within 24 hours of disclosure, but the technique makes a broader point: detection tools built around expected file formats cannot anticipate every compression or encoding variation an attacker might try next.

The Numbers Behind the Exposure

Hugging Face now hosts approximately 2.9 million public model repositories, up from roughly one million in early 2024. The first million models took over 1,000 days to accumulate from when tracking began in March 2022; the next million arrived in just 335 days. At that rate, the size of the attack surface is growing faster than any scanning tool can be updated to address it.

Protect AI's scan of over four million Hugging Face models found approximately 352,000 unsafe or suspicious issues across 51,700 distinct models. A separate analysis found a 5× year-over-year increase in the rate of malicious models being uploaded to the platform. Twenty-one per cent of all models remain exclusively in pickle format, including models from Meta, Google, Microsoft, NVIDIA, and Intel. Twenty-nine models in the current top-100 most downloaded list are pickle-only. The top 10 models from Google and Microsoft that have accepted contributions from the automated conversion bot received over 16.3 million combined downloads in a single month.

This is not a problem confined to obscure experimental repositories. The most downloaded models on the most trusted AI platform in the world are served in a format that executes arbitrary code during loading.

When the Safety Infrastructure Becomes the Attack Vector

SafeTensors, the format Hugging Face introduced in September 2022 specifically to address pickle's code execution problem, stores only tensor data with a JSON header. It cannot call functions during deserialisation. For organisations that exclusively load safetensors files from verified sources, the pickle attack class described above does not apply.

The problem is that converting a model from pickle to safetensors requires a conversion step, and Hugging Face provides an automated service for exactly that purpose. HiddenLayer documented in February 2024 that this conversion service could be hijacked using the very vulnerability it was designed to eliminate.

The conversion bot uses torch.load() internally. An attacker submits a malicious pickle model, the bot loads it, and the payload runs with the bot's own credentials. HiddenLayer demonstrated that a successful exploit could steal the SFconvertbot authentication token, which the official Hugging Face conversion service uses to submit pull requests to any repository. With that token, an attacker could submit backdoored pull requests while impersonating the official Hugging Face infrastructure. At the time of the research, the conversion bot had made 42,657 contributions across the platform. Models from Google and Microsoft had both accepted changes from it, with the ten most affected models from those two companies receiving over 16.3 million downloads in a single month.

HiddenLayer named this technique Silent Sabotage. The phrase fits: a developer downloading a model that had passed through the official conversion pipeline would have no reason to doubt what they received.

Your CI/CD Pipeline Is Not Checking

The most likely point of entry for your organisation is not a developer downloading a model through a browser and noticing something odd. It is a training job that runs overnight, an automated data pipeline pulling a checkpoint update, or a model evaluation step embedded in a CI workflow — any process where transformers.from_pretrained() or torch.load() executes unattended and unreviewed.

A developer adds a new model reference on a Friday afternoon. The pipeline runs at 03:00 Saturday. By the time anyone arrives on Monday, a reverse shell has been open for 50 hours. The model loaded cleanly. The job finished with a green status line. Nothing in the standard CI log indicates anything unusual occurred.

The broader AI package ecosystem demonstrated the same pattern repeatedly in early 2026. LiteLLM, a widely used LLM proxy library, was compromised in March 2026, potentially exposing approximately 500,000 credentials including API keys from Meta, OpenAI, and Anthropic. PyTorch Lightning was briefly compromised in April 2026 with a malicious version available for a 42-minute window before detection. The Bitwarden CLI was affected the same month with a 90-minute exposure window. In each case, organisations with automated dependency updates and no external package review would have pulled the malicious version in the ordinary course of their build process.

Namespace hijacking adds a further dimension to this. When a trusted account deletes a model repository, the username and model path become available for re-registration. An attacker who claims that namespace can distribute a malicious model to any codebase that still references the original path, with the developer having no reason to notice that the repository changed hands.

Indicators of Compromise

If you are investigating whether your environment may have been exposed, the following indicators of compromise have been identified across the current and prior campaigns.

Network infrastructure linked to the ClawHavoc operation: IP address 91.92.242.30 seen in active payload delivery; installer drop point at https://install.app-distribution.net/setup/; secondary staging URLs at http://91.92.242.30/1v07y9e1m6v7thl6 and http://91.92.242.30/6wioz8285kcbax6v; C2 domain velvet-parrot.com. File hash MD5 a37f6403fbf28fa0b48863287f4c5a5d. Infrastructure from the 2024 JFrog baller423 case: reverse shell endpoint 210.117.212.93:4242 and the star23/baller13 variant at 136.243.156.120:53252.

Beyond specific indicators, hunt for: PyTorch model loads without a weights_only=True flag; transformers.from_pretrained() calls with trust_remote_code=True in any automated job; 7z-compressed model archives in your download cache; outbound connections from machine learning infrastructure to unknown IPv4 addresses on non-standard ports.

Steps Worth Taking Now

Pin every model by its full commit hash in your code, not by name alone. A name resolves to whatever the repository currently contains; a commit hash is fixed to a specific state. Check whether your CI pipeline passes weights_only=True to every torch.load() call. PyTorch introduced this flag to restrict deserialisation to tensor types only, which blocks the __reduce__-based payload class entirely.

Audit which pipelines run with trust_remote_code=True. That flag instructs the Transformers library to download and execute code from the model repository directly. Where it is necessary, treat each instance as a named decision with a reviewer attached, not a default setting carried forward from an old tutorial. Run PickleScan as a CI gate before any model reaches a build step, and search your model directories for 7z-compressed archives, which are not standard packaging and may indicate a nullifAI-style payload waiting for a manual extraction.

How Defendis Can Help

Attacks like this one rarely announce themselves through official channels first. New payloads, fresh infrastructure, and updated distribution methods circulate in closed forums and private channels well before any public research surfaces them. By the time an incident makes it into a threat report, organisations without early visibility are already behind.

Defendis gives your security team that early visibility. We monitor the dark web, underground forums, and threat actor channels so your team receives relevant intelligence before it becomes breaking news, context about emerging threats matched against your organisation's exposure, without requiring your analysts to spend time in places they should not have to go.

If your organisation is building with AI tools, integrating third-party models, or simply trying to keep up with a threat landscape that moves faster than weekly briefings allow, that kind of continuous monitoring changes what your team is able to act on.

Book a demo →

Frequently Asked Questions

Is the safetensors format actually safe to use?

SafeTensors was built specifically to eliminate the pickle code execution problem. A properly formed safetensors file stores only tensor data and a JSON header. Nothing in the format can trigger a function call during loading. For models available in safetensors format from a verified source, the attack class described in this article does not apply. The caveat is that roughly 21% of Hugging Face models remain exclusively pickle-based, including models from large organisations. When only a pickle version exists, your options are: review the file manually before loading, pass weights_only=True to restrict deserialisation to tensors only, or wait until a safetensors version is published before pulling the model into a production pipeline.

Should we block all access to Hugging Face at the firewall?

For most organisations, blocking Hugging Face entirely is not realistic — too many legitimate workflows depend on it. The more targeted approach is to prohibit unapproved model downloads in production pipelines, require every model reference to be pinned to a reviewed commit hash, and run PickleScan as a CI gate. Outbound firewall rules that flag connections from ML infrastructure to unknown IPv4 addresses on non-standard ports will catch the most common reverse shell patterns before they become a persistent session.

How do I know if my organisation has already been affected?

Start with the indicators listed above. Run them against your endpoint detection tools, firewall logs, and SIEM. A match warrants immediate isolation of the affected system before any further action. No matches does not confirm you are clean. It means the known indicators were not present. The baller423 reverse shell opened silently with no error visible in the model loading process. Behavioural analysis looking for unexpected outbound connections from processes that loaded model files is the more reliable detection path.

What is the fastest way to reduce exposure?

Add weights_only=True to every torch.load() call in your codebase. This takes less than an hour across most projects and blocks the most common pickle-based code execution technique. Audit every pipeline for trust_remote_code=True flags and require explicit sign-off for each instance. Search your download caches and model directories for 7z-compressed archives. These three steps address the specific techniques documented in current campaigns and can be completed in a single working day.

Is this relevant for organisations that do not use AI internally?

Supply chain attacks of this type can affect any organisation whose software vendors or service providers build with AI tools. A company that does not run Hugging Face models directly may still be downstream of a vendor or partner that does. If you have not confirmed that your software supply chain includes model verification steps for any AI components it contains, the risk is present even when it is indirect.

How often should we review threat intelligence on AI supply chain threats?

The timeline on these campaigns is short. The privacy-filter model reached 244,000 downloads within eighteen hours of appearing. The PyTorch Lightning compromise had a 42-minute window. Weekly threat intelligence reviews are a minimum for teams with AI components in production. For teams actively building with AI tools and pulling new models regularly, automated alerting is more appropriate than a periodic manual review. The gap between upload and in-production exploitation is measured in hours, not weeks.

About the author
Sami Malik is a copywriter passionate about crafting clear, engaging, and impactful content that helps brands connect with their audience through storytelling and strategy.

Related Articles

Discover simplified
Cyber Risk Management
Request access and learn how we can help you prevent cyberattacks proactively.