How many user credentials have fallen into the hands of criminals during a decade of data breaches?
Earlier this month, the Have I Been Pwned? (HIBP) website offered a partial answer to that question by uploading something called Collection #1, a database of 773 million unique email addresses discovered circulating on a criminal forum.
Now researchers at Germany’s Hasso-Plattner Institute (HPI) have reportedly analysed a second cache that was part of the same discovery. This cache consists of four collections named, unsurprisingly, Collections #2-5, that they think contains a total of 2.2 billion unique pairs of email addresses and passwords.
Collection #1 consists 87GB of data cobbled together from more than 2,000 individual data breaches going back years.
Collections #2-5, for comparison, is said to be 845GB covering 25 billion records.
It’s a dizzying volume of data, which, despite the hundreds of millions or more people it must represent, is still small enough to fit on the hard drive of a recent Windows computer.
The obvious measure of these breaches is how much new data they represent, that which has not already been added to databases such as those amassed by HIBP or HPI.
Have I Been Pwned? estimated the unique data in Collection #1 at around 140 million email addresses and at least 11 million unique passwords.
HPI, meanwhile, estimates the number of new credentials at 750 million (it isn’t yet clear how many new passwords this includes).
The re-use deluge
When faced with these sorts of numbers, it’s tempting to shrug one’s shoulders and move on – most of these data breaches are old, so what harm might they be doing now?
Initially, breached credentials are probably traded to give attackers access to the account on the service from which they were stolen.
After that, they are quickly traded again to use as fuel for the epidemic of credential stuffing attacks. Credential stuffing thrives on our habit of reusing passwords – credentials for one service will often give a criminal access to other websites too.
Remember that while plaintext passwords are pay-dirt for criminals, usernames and email addresses are also valuable because they give them something to aim at when trying a brute-force attack.
But perhaps the real significance isn’t the volume of data so much as the fact it shows how criminals are able to build databases from lots of smaller breaches.
That’s where all the stolen credentials go – into larger databases where they can be more easily exploited.
Why have Collections #1-5 only come to light now?
Either because the data has already been exploited and is now so old that it no longer has much commercial value (Collection #1 was offered for sale at $45), or because so many criminals have access to it that’s effectively become an open source resource.
What to do?
It’s possible to check your email address and password against HIPB, although the site doesn’t appear to have uploaded Collections #2-5 yet. You can also check your email address against the HPI data.
No organisation is immune to the possibility of a breach. That’s why individuals must do more to secure themselves rather than trusting others to do it for them.
Start with simple principles:
- Use a password manager, not only to store passwords but to choose strong ones in the first place.
- These should be unique – use a different random password for every site.
- Where possible, turn on two-factor authentication (2FA). Some versions of authentication are superior to others, but any version is much better than nothing.
- If you think you might have reused any credentials in the past, change those ASAP.