The state of World Wide Web security in 2021

Editor’s note: This article is one of a three-part series exploring how secure internet users really are in 2021. Other articles in the series are, Password managers can make your network more secure – but mind the gaps, and Don’t fear the Wi-Fi.

In a recent discussion, the topic of how much safer or more at risk the average internet user is now versus 10 years ago came up.

If you read the news headlines, you might think we have gone from bad to worse, yet my gut reaction is that we’ve never been safer. Clearly, we haven’t “solved” security, yet it feels like we’ve checked a lot of items off the list.

To verify, I decided to take a look at the advancements we’ve made and see if they are making a difference.

Security in the early days

Today’s world wide web is a very different place from when it sprung from the mind of Sir Tim Berners-Lee in 1990. While the early web was free and open, it was a little too open. There was no privacy nor encryption to protect information moving between the numerous servers and routers involved in connecting the world.

Netscape helped solve this by introducing encryption through Secure Sockets Layer (SSL), later updated to a formal specification, Transport Layer Security (TLS). At the time, TLS was intended to secure your shopping cart, credit card information and occasionally your login ID and password.

Security by default

Strangely, this remained true all the way up until 2013, when an NSA contractor, Edward Snowden, decided to tell the world about how much online information the United States was gathering – and was able to gather – on almost everyone in the world.

Despite this, as late as October 2013, a few months after Snowden’s leaks, only 27.5% of web pages loaded by Mozilla Firefox were using some form of encryption.

This prompted people in the security industry to take an interest and work to improve the security and privacy of internet users globally.

The thinking was: the only way to solve this problem is to encrypt everything and make it a requirement, not an afterthought. This spurred on the introduction of new technologies and standards to ensure that things were secure by default and to prevent things from being downgraded to use old insecure methods.

New technology and standards didn’t eliminate the risk though. If someone can meddle with your network connections, they can simply redirect to an imposter site to steal your private information. This is known as a machine-in-the-middle attack (MitM), which could be conducted by providing false DNS (Domain Name System) responses, operating an evil twin Wi-Fi access point, or directly by ISPs (Internet Service Providers), governments, law enforcement, and others. Companies can even intercept TLS traffic using middle boxes designed to inspect protected traffic.

Fixing the problem

Even if the site you’re visiting is using HTTPS, it likely must listen on insecure HTTP (HyperText Transfer Protocol) and redirect users to the secure site, as web browsers typically default to trying HTTP first.

To tell the browser to make the initial connection over HTTPS, in 2012 Google proposed a new HTTP header: HSTS (HTTP Strict Transport Security). This HTTP header allowed website administrators to indicate their website should only ever be loaded over HTTPS and that browsers should never attempt making connections over HTTP on port 80.

Of course, this still means you could be at risk of a downgrade to HTTP the very first time you visit a site, before your browser has observed the HSTS header. This is known as SSL stripping, which is the type of MitM attack HSTS was designed to cure in the first place.

To solve this problem, HSTS has been extended with a “preload” option. Once appended to your HSTS header you can then submit your site to https://hstspreload.org to be listed as a built-in, always secure site for Chrome, Firefox, Opera, Safari, Internet Explorer, and Edge.

Late in 2013, to encourage all sites to deploy TLS encryption, Google announced its Chrome browser would begin to warn people when accessing insecure web pages and it would rank unencrypted sites lower in its search results.

Because of Google’s policy and the security community as a whole pushing hard, we doubled the number of sites supporting secure connections in just three years. Google statistics now show that in most countries, sites visited by Chrome users are encrypted ~95% of the time.

The most recent move by browser vendors to push us into an always encrypted world began in November of 2020 when Mozilla introduced an HTTPS-only option to Firefox. When enabled, this feature attempts all connections securely over HTTPS and falls back to a warning if HTTPS is not available. Chrome followed by adding a similar option and turned it on by default in April 2021.

That’s fantastic progress, but aside from the high rates of encryption, are people deploying technologies like HSTS and is it used widely enough to help protect users on untrusted networks?

Research into the current adoption of encryption and HSTS

To conduct this research, I decided to survey the top 15,000 domains as visited by customers of Sophos products that report telemetry back to SophosLabs. I filtered out some of the duplicates and other invalid links, such as 127.0.01 or localhost. This left me with a set of 13,390 total valid URIs to assess. (The raw data is available on the Sophos GitHub page.)

I worked with an IT (Information Technology) security student from the Southern Alberta Institute of Technology, Kovan Mohammed Ameen, to scrape all the HTTP headers as observed in the URI set using a python script and save them to a NDJSON file for analysis in Kibana. While the raw results themselves tell a story, the devil is always in the details.

Of the collected data 1,483 (10.64%) of the URIs were served over HTTP. This is in-line with publicly available data from Mozilla and Let’s Encrypt showing current HTTPS adoption rates. HTTPS was used to serve 12,448 (89.36%) of which 3,947 (31.71%) had set a HSTS header.

HTTP

At first glance, this doesn’t look nearly as good as we might hope. To begin with, I wanted to look at HTTP sites, which should not exist in 2021, to see what was still using this legacy protocol and what risk it might represent.

In the old days of Flash and Java, exploits were easy to discover and employ en masse, but these days exploitation is less likely, and it becomes a privacy and data theft issue.

I was unable to test all 1,483 by hand, but I did inspect the whole list visually as well as browsing by hand the top 100, sorted by prevalence.

Of those, 14 were interstitial pages that redirected to HTTPS sites on a different domain. This is a bad practice as you are unable to apply an HSTS header and the HTTP session is subject to hijacking each time someone visits the domain. It is better to redirect to an HTTPS page on the same domain with an HSTS header set before redirecting to the new page.

Also in the top 100 were at least 50 ad or tracking sites. Determining precisely by domain can be tricky, but I am confident that more than half are in the marketing industry. Of the remaining there were many you might expect like pages that OS (Operating System) manufacturers use to test whether they are behind a captive portal (public Wi-Fi login page) or whether a computer’s network supports IPv6 (Internet Protocol Version 6).

There were no particularly concerning findings in the top 100 HTTP sites, aside from the massive number of trackers, beacons and analytics in use and that none of them can be bothered to implement encryption properly.

HTTPS without HSTS

The second group of sites comprises the largest total number, 8,501. For these sites to be impersonated, one of a few conditions must be met. An attacker could perform a downgrade attack to redirect the victim to an unencrypted connection they control. They could convince a certificate authority (CA) to issue a valid certificate to them, allowing them to impersonate the site over an encrypted connection. Lastly, they could hope to convince the victim to click through the rather onerous warnings displayed by their browser to connect to their site encrypted by a forged certificate.

Considering that it is still possible to use MitM techniques to compromise these connections, my hope was that very few sites with sensitive login details, like banks, email providers, social media sites, and shopping sites would be in this list.

I began by looking at the top 100. Once again ads, analytics and content delivery networks (CDNs) took 14 of the top 20 spots on the list. I’ve included a list of the most surprising outliers in the table below.

High profile sites w/o HSTS	High profile sites w/HSTS only on login
comodoca.com	godaddy.com
xiaomi.com	samsung.com
jquery.com	thawte.com
yahoo.co.jp	nvidia.com
sectigo.com	rapidssl.com
globalsign.com	bbc.co.uk
mailchimp.com	amazonvideo.com
cnn.com
rakuten.co.jp
github.io

I was both surprised and disappointed to see so many security vendors on this list. I was also disappointed to see Yahoo! Japan on this list. All other Yahoo! countries use HSTS as far as I can tell, but not Japan, not even when logging into Yahoo! Mail on the Japan site.

It is important to note that only deploying HSTS on part of a site, especially parts other than the main landing page is about as effective as not using HSTS at all. If an attacker can compromise a landing page, they can simply redirect the “sign in” link to a different URL than the original sign in page and continue to gather credentials.

Nearly all of the sites that exhibited this behavior were using third-party authentication providers. This suggests they are not using HSTS, but their designated provider is. As mentioned above, that is not effective protection, but it is a good sign that authentication providers understand the value of using HSTS.

The only conceivable reason to not deploy HSTS is that website operators either don’t know about it or think the site might go back to an unencrypted state. There’s no going back to the old days, so we should work to get more web administrators on board with ratcheting up their security one more notch. Nearly 5% of the sites without HSTS at the time I gathered the data (April 2021) have now rolled it out (November 2021), including some heavy hitters like hp.com.

HTTPS with HSTS

What is the point of pointing out the secure sites?

Well, not all of them are configured correctly. HSTS includes a configuration value called max-age. It is intended to tell your web browser how long, in seconds, to remember to only use TLS when connecting to a site. The recommended value is 31,557,600 or approximately one year.

Strangely, 4.86% of the sites advertising an HSTS header had max-age set to 0, effectively disabled. Why would a company do that? Some very big names were in this short list, including oracle.com. I combed through the list and found two companies in the security industry that I had personal contacts at.

When I reached out to them to inquire as to why they had this configuration, the response was swift and the same in both cases. It was a legacy setting that had been set during the development of the website and was supposed to be changed when the site moved to production. They had disabled HSTS to make testing easier without the need to renew security certificates and simply forgot to undo the setting.

This could likely be true for many of the sites with this configuration. Development-to-production issues are not a new problem and often lead to security misconfigurations and data leakage issues. It is worthwhile for all web administrators to review all security settings on a regular basis, not just when moving from development to production, as best practices evolve over time as we continue to improve the baseline of what is “good enough.”

Conclusion

The web has never been safer.

With 95% of web pages encrypted and those that aren’t mostly not presenting much risk, this is great news, especially during any of the busy online shopping seasons.

Bit by bit, the security community has worked together to improve standards, apply pressure on laggards and lower the costs of communicating securely over the internet. The amount of progress that has been made is impressive considering what the scale of the problem once was.

However, the job is by no means complete. With only 31.6% of sites using HSTS, it shows that even features that are free and provide significant security improvements are not as widely deployed as they should be.

Securing the application layer has massive implications for users and safety. There’s still a risk the providers of the networks we use will spy on us, sell us to advertising networks or will be compromised by cybercriminals.

But, because of HSTS and TLS, you can pretty much browse and communicate as freely as you please with negligible risk of a bad outcome, even over untrusted Wi-Fi and mobile networks.

The state of World Wide Web security in 2021

Security in the early days

Security by default

Fixing the problem

Research into the current adoption of encryption and HSTS

HTTP

HTTPS without HSTS

HTTPS with HSTS

Conclusion

Chester Wisniewski

Read Similar Articles

What to expect when you’ve been hit with Avaddon ransomware

What’s New in Sophos EDR 4.0

Sophos XDR: Driven by data

Leave a Reply Cancel reply