In October 2021, the npm package UAParser.js was being downloaded 7 million times per week. But then the maintainer's password was compromised and was sold on a notorious Russian hacking forum.
Malware was added that would execute immediately whenever anyone installed one of the malicious versions. This malware would run a cryptocurrency miner on your machine (unless your IP address was from Russia, Ukraine, Belarus, or Kazakhstan, presumably to avoid annoying authorities in those countries). But that’s not all. It would also steal passwords from 100 different programs on the machine, as well as anything in the Windows credential manager, which includes the user's passwords.
In addition to people actively downloading the module, UAParserJS is used by over 3 million GitHub repositories and many libraries that were created by big companies like Facebook were affected because they depended upon this package. Anyone who installed or upgraded their package would have been infected.
Luckily the maintainer spotted the issue quickly and got the malicious packages removed within a matter of hours so the impact was limited. But this is actually just the tip of the iceberg. We've seen 150 different packages removed for security reasons in May 2022 alone. So this trend seems to be accelerating, and attackers are taking advantage of the trust in the open source ecosystem.
Where are the vulnerabilities
The way we write software has really changed in the last decade. We use dependencies a lot more liberally, and that leads to thousands of dependencies in most projects. In fact, as much as 90 percent of an app's code comes from open source.
The proliferation of open-source dependencies
A 2019 paper found that installing an average NPM package introduces implicit trust in 79 third-party packages and 39 maintainers, creating a really surprisingly large attack surface. And that’s just one package. Discord, a popular chat application, consists of 19,000 total packages, with code from 300,000 different contributors across 206 different countries.
The reason open source works is because anyone can inspect the code, contribute, and publish a package. The problem is that no one reads code today. We're downloading code from the internet written by unknown individuals that we haven't read. We then execute these packages with full permissions on our laptops and servers, where we keep our most important data. It's a miracle that this system works at all.
NPM doesn't make it easy to read code. Developers often have to resort to clicking on the GitHub link to read the code. Attackers, however, can publish different code to NPM and GitHub. NPM does not guarantee that the code on GitHub matches the code on NPM. So no one can even really look at the code.
This may be the reason why a malicious package is available for 209 days on average before it's publicly reported and 20% of malware persists for over 400 days and have more than 1,000 downloads.
There are tools on the market but scanning for known vulnerabilities is too reactive to stop an active supply chain attack. It's never OK to ship malware to production. You must catch it before you install it, and vulnerability scanners will not catch these supply chain attacks early enough. Vulnerabilities can take weeks or months to be discovered, yet code can be deployed in minutes.
Vectors are how the attacker tricks you.
The most common attack technique is hijacked packages. This can happen in multiple ways: a weak maintainer password, maintainers can get malware on their computers, the maintainer themselves can give access to a malicious actor by mistake or the maintainers can become malicious themselves. All of this is exacerbated by the fact that NPM doesn't enforce 2FA, though this is starting to somewhat improve in recent times.
The next tactic is typo-squatting. This is a pretty nefarious trick. There are two packages on npm: noblox.js-proxied and noblox.js-proxy. If you look at these two packages you might be hard-pressed to guess which is real and which is fake. But if you were to make the mistake of installing the fake package, you would get greeted with a nice supply chain attack.
Dependency confusion is closely related to typo-squatting. But instead of relying on the user making a mistake about the specific dependency that they install, this attack works when a company publishes packages to their own internal private NPM registry. They use a name that hasn't been registered on the public registry. And so an attacker can come along and register a package with the same name, but on the public registry. And then later, some internal tools may get confused and use the public version of the package instead of the internal version. All kinds of organizations were affected by attacks like these, including really large companies and the federal government.
Tactics are what the code actually does when it runs.
The most common tactic is to install scripts. A 2022 paper found almost 94% of malicious packages had at least one install script. And unfortunately, install scripts have legitimate uses, so it's not an easy solution to just disable them.
The second tactic is to steal data. It is common for a script to send your process.env data, which is your environment variables, to a random domain. Sometimes an HTTP request can get blocked by a firewall, so there's also a DNS technique. It puts the data into the subdomain of the URL, and it sends it again. Both methods mean that your tokens, your keys, and other environment variables will be exfiltrated by the rogue script. Once they have your data they can post it online, use it to break into your systems, delete it or ransom your data.
The first thing you can do is to choose better dependencies. Most of us aren't going to look at the actual code though so we use heuristics like downloads, docs, GitHub stars, and tests.
Sometimes a package might check all these boxes, but it may still be compromised. This is where we can use tools like Socket to dig into the contents of a package and tell you what it does. For example, a package buffer util can show what code is going to run automatically on installation.
Here’s an example of a package called tiktok_embed. The package is accessing environment variables and just sending it off to some server on the internet.
If you update your dependencies too slowly, then you're exposed to known vulnerabilities. However, if you update too quickly, you're exposed to supply chain attacks because now you're running code that no-no one has seen yet. It really comes down to balancing the trade-offs. But this is something to at least think about as a team and come up with a policy for what you want to do here.
Audit every dependency
Again, there's a trade-off here. You can do a full audit: read every single line of code of every dependency in your project. If you do this, it's thorough, it's the best-in-class thing you can do, but it's a lot of work. It's also slow and time-consuming and therefore expensive because it takes a lot of time to properly audit that much code.
On the other hand, a lot of teams are doing nothing. And if you take this approach, then you're vulnerable to supply chain attacks. It could be expensive in terms of PR costs to the company or costs associated with breaches. The happy medium here is to lean on automation.
What we recommend is using static analysis to audit every dependency and detect indicators of packages executing suspicious activities such as using privileged APIs or containing obfuscated code. Socket can deliver this by using a bot comment on your pull request telling you what issues are present in this dependency. And the developer can make an informed decision about this dependency.
By understanding the threats that exist we can take steps as teams to protect ourselves, as well as our software supply chains.
This needs a mindset shift around dependencies. Instead of thinking of them as free and safe we really need to think of them as part of our apps and protect ourselves accordingly.