If you’ve ever used the Python programming language, or installed software written in Python, you’ve probably used PyPI, even if you didn’t realise it at the time.
PyPI is short for the Python Package Index, and it currently contains just under 300,000 open source add-on modules (290,614 of them when we checked [2021-03-07T00:10Z]).
You can download and install any of these modules automatically just by issuing a command such as pip install [nameofpackage]
, or by letting a software installer fetch the missing components for you.
The full list includes, to put it plainly, some peculiar projects, with the first five in alphanumeric order being…
0 0-._.-._.-._.-._.-._.-._.-0 00000a 0.0.1 007
…and the final five doing their very best to be last on the list:
zzzfs zzzutils zzz-web zzzz zzzZZZzzz
Grab-a-package
As you probably know, many contemporary programming ecosystems such as Python, Node.js and Ruby provide huge, free, public repositories of this sort, and come with easy-to-use tools to fetch all the add-on modules you need and install them automatically.
If you suddenly realise you want to use Python module called asteroid
, for example, you can just do pip install asteroid
, after which your own Python programs can say import asteroid
, and start making use of the package.
The package asteroid
is not a look-alike of the game Asteroids by Atari, by the way, nor is it related to astronomy. It’s an audio processing system that claims to be able to separate voice recordings with multiple participants into separate channels for each speaker.
Malicious updates
The ease with which trusting users download and install new Python (and Node.js, and Ruby, etc.) components has led to a range of cybercriminal attacks against package managers.
Crooks sometimes Trojanise the repository of a legitimate project, typically by guessing or cracking the password of a package owner’s account, or by helpfully but dishonestly offering to “assist” with a project that the original owner no longer has time to look after.
Once the fake version is uploaded to the genuine repository, users of the now-hacked package automatically get infected as soon as they update to the new version, which works just as it did before, except that it includes hidden malware for the crooks to exploit.
Another trick involves creating Trojanised public versions of private packages that the attacker knows are used internally by a software company.
The public version of the package is given a higher version number that the internal version, and if the company hasn’t secured its auto-updating processes correctly, the attacker may be able to trick a company’s whole development team, or even the organisation’s official software build system, into updating private code from an untrusted (and malicious) external source.
Cybersecurity researcher Alex Birsan famously made well over $100,000 in bug bounties recently by feeding external versions of supposedly internal software packages into dozens of IT giants including Apple, PayPal, Microsoft and Shopify.
This sort of trick is known as a supply chain attack, for obvious reasons.
In a supply chain attack, the crooks don’t break into your network and install the malware directly.
Instead, they insert their malware upstream from you, implanting it into someone else’s network, repository or delivery mechanism and waiting for the infection to pass down the chain until it reaches you.
Package squatting
A third sort of supply chain attack – one that is rather less sophisticated and has no guarantee of success, yet is extremely easy to pull off – is to create a fake package with a misleading name that users in a hurry might download and install by mistake.
Just like typosquatting in the website world, where crooks register near-miss domain names in the hope you won’t notice you’re on the wrong site (e.g. writing c0mpany
instead of company
), package squatters register near-miss or otherwise believable package names that they hope you’ll fetch by mistake.
Recent examples, now removed, that turned up just last week in the Python Package Index include:
Fake name Possible target Function of real package Difference -------------- --------------- ------------------------ ----------------------- asteroids asteroid Audio processing Plural, not singular beauitfulsoup4 beautifulsoup4 HTML/XML parsing Typo (letters swapped) llvm llvmpy LLVM compiler Suffix left off winpty winpy Windows functions Extra letter inserted wwebsite website HTML manipulation Doubled letter at start
Meddling considered harmful
As far as we are aware, none of these fake packages contained outright malware, or indeed any permanent package code at all.
However, some of them (if not all – it’s hard to check now that they have been removed) included a Python command that was intended to run when the package was installed, rather than when it was used.
The command looked like this:
url = "h"+"t"+"t"+"p"+":"+"/"+"/"+[REDACTED IP NUMBER]+"/name?FAKEPACKAGENAME" requests.get(url, timeout=30)
This is a crude but simple way to do what’s know in the jargon as telemetry – in other words, to keep track remotely of who has downloaded and installed the package.
The code above simply calls home to a remote web server with the name of the installed package in the URL, and ignores the data that comes back, if there is any.
Presumably, the redacted IP number in the above URL (it’s a Tencent cloud server hosted in Tokyo, Japan, for what that’s worth) is operated by the uploader of the above packages…
…who goes by the unusual and mildly ungrammatical moniker Remind Supply Chain Risks.
Fascinatingly, if rather pointlessly, this user didn’t just upload the five fake libraries listed above, but a grand total, according to the Wayback Machine, of 3951 utterly bogus PyPI packages.
Peculiarly, many, if not most, of the package names were either incongruous or unlikely to be chosen by mistake, such as Build-Number-Incrementor-for-C-Sharp
and Web-Service-for-Android-GMaps-AsyncTask-Demo
.
We haven’t been able to figure out where or how our mystery Supply Chain Risks user generated their list of fake package names, but perhaps just having a small number of “real-looking” typosquat fakes amonst the vast sea of bogus and even ludicrous ones was part of the plan?
At any rate, it looks as though Remind Supply Chain Risks subscribes to the idea that a job worth doing (or, as in this case, a job that isn’t really worth doing at all) is worth overdoing.
Fortunately, the Python team has already removed all these offending items…
…although we couldn’t help noticing that there is already a new fake beautifulsoup4
imposter in the PyPI database, this time entitled beatufulsoup4
, uploaded on 2021-03-03.
This one contains no code at all, but it does have the this-would-be-wittier-if-it-were-not-wearing-a-bit-thin-by-now project title “You may want to install beautifulsoup4, not beautfulsoup4” to prove its this-didn’t-really-need-proving-yet-again point.
What to do?
- Don’t do mass bogus uploads like this to prove your point. We appreciate the message you are trying to deliver, but it’s already been documented so you are just making distracting work for other people who could more usefully be doing something else for the project.
- Don’t choose a PyPI package just because the name looks right. Check that you really are downloading the right module from the right publisher. Even legitimtate modules sometimes have names that clash, compete or confuse.
- Don’t hook internal projects to external repositories by mistake. If you are using Python packages that you haven’t published externally, then the one thing you can be sure of is that all external copies of “your” package are imposter modules, probably malware.
- Don’t blindly download package updates into your own development or build systems. Test and review everything you download before you approve it for use. Remember that packages typically include update-time scripts that run when you do the update, so malware infections could be delivered as part of the update process, not of the module source code that ultimately gets installed.