Cookies Notice
This site uses cookies to deliver services and to analyze traffic.
Our security research and data science teams detected a resurgence of a malicious repo confusion campaign that began mid-last year, this time on a much larger scale. The attack impacts more than 100,000 GitHub repositories (and presumably millions) when unsuspecting developers use repositories that resemble known and trusted ones but are, in fact, infected with malicious code.
Similar to dependency confusion attacks, malicious actors get their target to download their malicious version instead of the real one. But dependency confusion attacks take advantage of how package managers work, while repo confusion attacks simply rely on humans to mistakenly pick the malicious version over the real one, sometimes employing social engineering techniques as well.
In this case, in order to maximize the chances of infection, the malicious actor is flooding GitHub with malicious repos, following these steps:
Once unsuspecting developers use any of the malicious repos, the hidden payload unpacks seven layers of obfuscation, which also involves pulling malicious Python code and later a binary executable. The malicious code (largely a modified version of BlackCap-Grabber) would then collect login credentials from different apps, browser passwords and cookies, and other confidential data. It then sends it back to the malicious actors’ C&C (command-and-control) server and performs a long series of additional malicious activities.
Most of the forked repos are quickly removed by GitHub, which identifies the automation. However, the automation detection seems to miss many repos, and the ones that were uploaded manually survive. Because the whole attack chain seems to be mostly automated on a large scale, the 1% that survive still amount to thousands of malicious repos. You can check out a small portion of the current wave yourself by simply searching the following in GitHub: 🔥 2024 language:python.
Counting the removed ones, the number of repos reaches millions. Usually the removal happens a few hours after the upload, so it’s challenging to document them. We know the removal is automated because many of the original ones still exist, and it mainly targets the fork bombs. For example, here you can see thousands of forks appear in the summary but none in the details.
Because of the operation’s large scope, this campaign has a sort of 2nd-order social engineering network effect when, every now and then, naive users fork the malicious repos without realizing they are spreading malware. Kind of ironic to see it spreading by humans after such heavy reliance on automation.
Here is a brief history of this malicious campaign:
May 2023: As originally reported by Phylum, several malicious packages were uploaded to PyPI containing early parts of the current payload. These packages were spread by ‘os.system(“pip install package”)’ calls planted in forks of popular GitHub repos, such as ‘chatgpt-api’.
July – August 2023: Several malicious repos were uploaded to GitHub, this time delivering the payload directly instead of through importing PyPI packages. This came after PyPI removed the malicious packages, and the security community increased its focus there. Aliakbar Zahravi and Peter Girnus from Trend Micro published a great technical analysis of it.
November 2023 – Now: We have detected more than 100,000 repos containing similar malicious payloads, and the number keeps growing. This attack approach has several advantages:
Judging by the many incidents we have observed in package managers and SCM platforms, the transition of this campaign from malicious packages in PyPI to malicious GitHub repos seems to reflect a general trend. It seems that nowadays, the security community puts extra focus on package managers, so that was to be expected.
The ease of automatic generation of accounts and repos on GitHub and alike, using comfortable APIs and soft rate limits that are easy to bypass, combined with the huge number of repos to hide among, make it a perfect target for covertly infecting the software supply chain.
This campaign, along with dependency confusion campaigns plaguing package registries and generally malicious code being spread through source control managers, demonstrates how fragile software supply chain security is, despite the abundance of tools and available security mechanisms.
Cloudflare was notified and deactivated the DNS records of the malicious addresses found.
GitHub was notified, and most of the malicious repos were deleted, but the campaign continues, and attacks that attempt to inject malicious code into the supply chain are becoming increasingly prevalent. There are countless solutions for catching malware at the system and network levels, but the supply chain remains a massive and lucrative attack surface for malicious actors. If you encounter any malicious repo, part of this campaign or not, we encourage you to report it.
At Apiiro, we’ve built a malicious code detection system that monitors any connected codebases. We then detect attacks by using deep code analysis using multiple advanced techniques: LLM-based code analysis, deconstruction of the code into a complete execution flow graph, an elaborate heuristics engine, dynamic decoding, decryption, and de-obfuscation, and more, so it’s pretty hard to fool it.
Without monitoring your code for injected malicious payloads, the security of your whole organization is determined by things like the ability of your developers to not choose the wrong repo, which is almost identical, not having a single CI/CD misconfiguration, having 100% secure 3rd party code, and other impossible conditions. That’s why we as an industry need to start going beyond typical vulnerability detection and ingestion to surface the next generation of software supply chain and application risks.