Skip to content
Naked Security Naked Security

PyPI open-source code repository deals with manic malware maelstrom

Controlled outage used to keep malware marauders from gumming up the works. Learn what you can do to help in future...

Public source code repositories, from Sourceforge to GitHub, from the Linux Kernel Archives to ReactOS.org, from PHP Packagist to the Python Package Index, better known as PyPI, are a fantastic source (sorry!) of free operating systems, applications, programming libraries, and developers’ toolkits that have done computer science and software engineering a world of good.

Most software projects need “helper” code that isn’t a fundamental part of the problem that the project itself is trying to solve, such as utility functions for writing to the system log, producing colourful output, uploading status reports to a web service, creating backup archives of old data, and so on.

In cases like that, you can save time (and benefit for free from other people’s expertise) by searching for a package that already exists in one of the many available repositories, and hooking that external package into your own tree of source code.

In the other direction, if you’re working on a project of your own that includes some useful utilities you couldn’t find anywhere else, you might feel inclined to offer something to the community in return by packaging up your code and making it available for free to everyone else.

The cost of free

As you’re no doubt aware, however, community source code repositories bring with them a number of cybersecurity challenges:

  • Popular packages that suddenly vanish. Sometimes, packages that a well-meaning programmer has donated to the community become so popular that they become a critical part of thousands or even hundreds of thousands of bigger projects that take them for granted. But if the original programmer decides to withdraw from the community and to delete their projects (which they have every right to do if they have no formal contractual obligations to anyone who’s chosen to rely on them), the side-effects can be temporarily disastrous, as other people’s projects suddenly “update” to a state in which a necessary part of their code is missing.
  • Projects that get actively hijacked for evil. Cybercriminals who guess, steal or buy passwords to other people’s projects can inject malware into the code, and anyone who already trusts the once-innocent package will unwittingly infect themselves (and perhaps their own customers) with malware if they download the rogue “update” automatically. Crooks can even take over old projects using social engineering trickery, by joining the project and being really helpful for a while, until the original maintainer decides to trust them with upload access.
  • Rogue packages that masquerade as innocent ones. Crooks regularly upload packages that have names that are sufficiently close to well-known projects that other users download and use them by mistake, in an attack jocularly known as typosquatting. (The same trick works for websites, hoping that a user who mistypes a URL even slightly will end up on a bogus look-alike site instead.) The crooks generally clone the genuine package first, so it still performs all the functions of the original, but with some additional malicious behaviour buried deep in the code.
  • Petulant behaviour by so-called “researchers”. We’ve sadly had to write about this sort of probably-legal-but-ethically-dubious behaviour several times. Examples include a US PhD student and their supervisor who deliberately uploaded fake patches to the Linux kernel as part of an unauthorised experiment that the core Linux team were left to sort out, and a self-serving “expert” with the nickname Supply Chain Risks who uploaded a booby-trapped fake project to the PyPI repository as a reminder of the risk of so-called supply chain attacks. SC Risks then followed up their proof-of-concept “research” package with a further 3950 packages, leaving the PyPI team to find and delete them all.

Rogue uploaders

Unfortunately, PyPI seems to have been hammered by a bunch of rogue, automated uploads over the past weekend.

The team has, perhaps understandably, not yet given any details of how the attack was carried out, but the site temporarily blocked anyone new from joining up, and blocked existing users from creating new projects:

New user and new project name registration on PyPI is temporarily suspended. The volume of malicious users and malicious projects being created on the index in the past week has outpaced our ability to respond to it in a timely fashion, especially with multiple PyPI administrators on leave.

While we re-group over the weekend, new user and new project registration is temporarily suspended. [2023-05-20T16:02:00Z]

We’re guessing that the attackers were using automated tools to flood the site with rogue packages, presumably hoping that if they tried hard enough, some of the malicious content would escape notice and get left behind even after the site’s cleanup efforts, thus completing what you might call an Security Bypass Attack

…or perhaps that the site administrators would feel compelled to take the entire site offline to sort it out, thus causing a Denial of Service Attack, or DoS.

The good news is that in just over 24 hours, the team got on top of the problem, and was able to announce, “Suspension has been lifted.”

In other words, even though PyPI was not 100% functional over the weekend, there was no true denial of service against the site or its millions of users.

What to do?

  • Don’t choose a repository package just because the name looks right. Check that you really are downloading the right module from the right publisher. Even legitimate modules sometimes have names that clash, compete or confuse.
  • Don’t blindly download package updates into your own development or build systems. Test and review everything you download before you approve it for use. Remember that packages typically include update-time scripts that run when you do the update, so malware infections could be delivered via the update process itself, not as part of the package source code that gets left behind afterwards.
  • Don’t make it easy for attackers to get into your own packages. Choose proper passwords, use 2FA whenever you can, and don’t blindly trust newcomers to your project as soon as they start angling to get maintainer access, no matter how keen you are to hand the reins to someone else.
  • Don’t be a you-know-what. As this story reminds us all, volunteers in the open source community have enough trouble with genuine cybercriminals without having to deal with “researchers” who conduct proof-of-concept attacks for their own benefit, whether for academic purposes or for bragging rights (or both).

8 Comments

This is part of a big problem with today’s internet – anonymity. If everyone had a verified digital ID with an attached real name, and used it for all services, cybercrime, online mobbing and spam etc. would be much reduced. Security would be improved. I would not mind paying $ 0.001 to send mail or the same to receive newsletters.

You should still be able to turn off the ID by default, e.g., when you are browsing or wish to send an anonymous whistleblowing report.

Reply

If only there were someone we were all inclined to trust with those verified digital IDs, eh? An organisation that would have no inclination to privatise the service, or to outsource it overseas to the lowest bidder, or to recoup the cost by selling the “unimportant” parts of your personal data to marketing agencies, or to back it up into insecure cloud buckets, or to lease it to analytics companies, or to use it for long-term policy planning, or…

Reply

Every country has a government which like it or not is the de facto trusted entity entitled to define and enforce property rights, etc.. Every government could issue a single identity token to each person and legal entity which would be required for all transactions and dealings requiring identification. Multifactor biometric/genetic keys could be used to secure the identity token against theft. Activities explicitly allowing anonymity could still be tolerated, but any activity requiring identification could be forced to use the identity token.

I think most people don’t like the idea of a universal identity token, since it eliminates a great many opportunities for fraud and would thereby eliminate a great many “business opportunities”. Hence the status quo.

Reply

We already have ID in the form of identity cards, driving licences, passeports etc. Your data is already stored by the government. The physical ID can’t be used to verify your ID (yet), unfortunately.

Reply

This is a fantastic overview of some of the main risks for the open source community. Well done, Duck!

Reply

Thanks. Glad you found it useful.

Getting spammed with massive amounts of new malware in intense bursts is a problem we’ve faced since the first automated malware construction kits showed up in about 1990 (annoyingly at about the same as affordable CD-R drives – snail-mailed CDs had really high bandwidth in those days!)…

…so our sympathy to any community caught up in this sort of anti-social-at-best and outright-criminal-at-worst malware spammage.

Reply

Wow. I appreciate the heads-up. Thank you, Paul Ducklin and Sophos. Think paranoid, guys.

Reply

Welllll, no need for paranoia… that’s a step too far.

But not blindly connecting your DevOps/CI system to a “wget the latest code straight from someone else’s repository to ensure you are up to date” is a good start.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?
You’re now subscribed!