Researchers have found that one of the most popular source code repositories in the world is still housing thousands of publicly accessible encryption keys.
Over 100,000 code repositories on source code management site GitHub contain secret access keys that can give attackers privileged access to those repositories (repos) or to online service providers’ services.
Researchers at North Carolina State University (NCSU) scanned almost 13% of GitHub’s public repositories over nearly six months. In a paper revealing the findings, they said:
We find that not only is secret leakage pervasive – affecting over 100,000 repositories – but that thousands of new, unique secrets are leaked every day.
The credentials that developers routinely publish on their GitHub repos fall into several categories. These include SSH keys, which are digital certificates that automatically unlock online resources. Another is application programming interface (API) keys (also known as tokens). These are digital keys that enable developers to access online services ranging from Twitter to Google Search directly from their programs. The researchers found a mixture of these keys for services including Google, Twitter, Amazon Web Services, Facebook, MailChimp, online telephony service Twilio, and credit card processing companies Stripe, Square, and Braintree.
These leaks sometimes compromised high-value targets. The researchers found Amazon Web Service (AWS) credentials for a large website serving millions of US college applicants. They also found AWS credentials for the website of a major government agency in a Western European country.
How does it happen?
Developers sometimes get careless when updating the code on their machines and then sending it to GitHub, which they typically do using command line instructions known as commits and pushes.
Coders will sometimes store SSH keys and API keys in the same directories as their source code, so that they get caught up in the commit and push process. It’s an easy mistake to make with SSH keys, which developers often generate from the command line. Some other mishaps are even more facepalm-worthy, such as embedding API keys directly in source code.
One way of preventing private keys from being committed is to tell a .gitignore
file where they are. This is a file that blocks certain information from being uploaded to a GitHub repo. Instead, some developers stored their secrets directly in the .gitignore
file, meaning that it got included in their repos.
Some online services like OAuth require multiple secrets for access, such as a digital key and an ID. That didn’t provide much extra security in this case though, because four in five of the repos holding these secrets contained the other information required to access the third-party service as well.
Many developers did nothing when notified of the problem, according to the paper. Those that tried to fix the problem tended to create new commits for their repos that removed the secrets. This doesn’t work, because GitHub is a front end for Git, a version control system that purposely stores information held in past commits so you can keep track of what changed, and when.
What devs really need to do is either rewrite their history to remove the offending commit, or delete the entire repo and start again without storing the password, said the researchers. Most people did neither.
How did the researchers find these keys? Was it via some nefarious hack or loophole in the website? Nope – they just searched for it. GitHub has a search API that can be used to search across all its repos, and it happily delivers the secret key data.
Paper co-author Brad Reaves told us:
While we used the Search API, which requires an API key that can be obtained for free by any GitHub user, keys can also be found with the online search function.
This has been a problem since at least 2013, when GitHub shut down its search service for a while after finding secret keys turning up in searches. He added:
After this was publicized, GitHub took down the Code Search tool, claiming unrelated reasons, but shortly relaunched the tool with the same functionality.
So is all of this GitHub’s fault? Hardly. As Reaves pointed out:
Code search is a great tool, but it would be very difficult for GitHub to build a tool that censored all possible secrets; the burden is on developers not to post secrets to public repositories.
To its credit GitHub, which Microsoft acquired for $7.5bn in October 2018, is trying to make things better. It introduced rate limits for its search tool, although the paper points out that an attacker could overcome this by searching through multiple accounts. It has also been scanning repositories for several years to find GitHub OAuth tokens and personal access tokens, which can be used to access peoples’ GitHub repositories.
In October 2018, GitHub also announced partnerships with third-party online services as part of a new feature called Token Scanning. This scans new commits or private-turned-public repos for service providers’ API keys and notifies the appropriate service provider when it finds them. That service provider may then choose to revoke the credentials, which is the step GitHub recommends, according to a spokesperson there. She also told us that it has shared information on more than 100 million compromised tokens so far.
It’s a start, said Reaves, but GitHub’s work can only solve the problem up to a point:
I think efforts like GitHub’s Token Scanning project should be applauded, but they are only effective once a leak has already occurred. This problem also is likely not isolated to GitHub – it will affect any publicly available code. We need more research to develop systems that help developers avoid this mistake in the first place.
Kudos to GitHub for trying its best to solve the problem, but it’s up to developers to use services like this – and the associated tools like Git – properly.