Sophos News

Memex – DARPA’s search engine for the Dark Web

Anyone who used the World Wide Web in the nineties will know that web search has come a long way. Sure, it was easy to get more search results than you knew what to do with in 1999 but it was really hard to get good ones.

What Google did better than Alta Vista, HotBot, Yahoo and the others at the dawn of the millennium was to figure out which search results were the most relevant and respected.

And so it’s been ever since – search engines have become fast, simple interfaces that compete based on relevance and earn money from advertising.

Meanwhile, the methods for finding things to put in the search results have remained largely the same – you either tell the search engines your site exists or they find it by following a link on somebody else’s website.

That business model has worked extremely well but there’s one thing that it does not excel at – depth.

If you don’t declare your site’s existence and nobody links to it, it doesn’t exist – in search engine land at least.

Google’s stated aim may be to organize the world’s information and make it universally accessible and useful but it hasn’t succeeded yet. That’s not just because it’s difficult, it’s also because Google is a business and there isn’t a strong commercial imperative for it to index everything.

Estimates of how much of the web has been indexed vary wildly (I’ve seen figures of 0.04% and 76% so we can perhaps narrow it down to somewhere between almost none and almost all) but one thing is sure, there’s enough stuff that hasn’t been indexed that it’s got it’s own name – the Deep Web.

It’s not out of the question to suggest that the part of the web that hasn’t been indexed is actually bigger than the part that has.

A subset of it – the part hosted on Tor Hidden Services and referred to as the Dark Web – is very interesting to those in law enforcement.

There are all manner of people, sites and services that operate over the web that would rather not appear in your Google search results.

If you’re a terrorist, paedophile, gun-runner, drug dealer, sex trafficker or serious criminal of that ilk then the shadows of the Deep Web, and particularly the Dark Web, offer a safer haven then the part occupied by, say, Naked Security or Wikipedia.

Enter Memex, brainchild of the boffins at DARPA, the US government agency that built the internet (then ARPANET).

DARPA describes Memex as a set of search tools that are better suited to government (presumably law enforcement and intelligence) use than commercial search engines.

Whereas Google and Bing are designed to be good-enough systems that work for everyone, Memex will end up powering domain-specific searches that are the very best solution for specific narrow interests (such as certain types of crime.)

Today's web searches use a centralized, one-size-fits-all approach that searches the internet with the same set of tools for all queries. While that model has been wildly successful commercially, it does not work well for many government use cases.

The goal is for users to ... quickly and thoroughly organize subsets of information based on individual interests ... and to improve the ability of military, government and commercial enterprises to find and organize mission-critical publically [sic] available information on the internet.

Although Memex will eventually have a very broad range of applications, the project’s initial focus is on tackling human trafficking and slavery.

According to DARPA, human trafficking has a significant Dark Web presence in the form of forums, advertisements, job postings and hidden services (anonymous sites available via Tor).

Memex has been available to a few law enforcement agencies for about a year and has already been used with some success.

In September 2014, sex trafficker Benjamin Gaston was sentenced to a minimum of 50 years in prison having been found guilty of “Sex Trafficking, as well as Kidnapping, Criminal Sexual Act, Rape, Assault, and Sex Abuse – all in the First Degree”.

Scientific American reports that Memex was in the thick of it:

A key weapon in the prosecutor's arsenal, according to the NYDA's Office: an experimental set of internet search tools the US Department of Defense is developing to help catch and lock up human traffickers.

The journal also reports that Memex is used by the New York County District Attorney’s Office in every case pursued by its Human Trafficking Response Unit, and it has played a role in generating at least 20 active sex trafficking investigations.

If Memex carries on like this then we’ll have to think of a new name for the Dark Web.


Image of Fractal Texture spiral Dark Web Abstract Nether licensed under Creative Commons, courtesy of TextureX on DeviantArt