Sophos News

Is DARPA’s Memex search engine a Google-killer?

The history of computing features a succession of organisations that looked, for a while at least, as if they were so deeply embedded in our lives that we’d never do without them.

IBM looked like that, and Microsoft did too. More recently it’s been Google and Facebook.

Sometimes they look unassailable because, in the narrow territory they occupy, they are.

When they do fall it isn’t because somebody storms that territory, they fall because the ground beneath them shifts.

For years and years Linux enthusiasts proclaimed “this will be the year that Linux finally competes with Windows on the desktop!”, and every year it wasn’t.

But Linux, under the brand name Android, eventually smoked Microsoft when ‘Desktop’ gave way to ‘Mobile’.

Google has been the 800-pound gorilla of web search since the late 1990s and all attempts to out-Google it have failed. Its market share is rock solid and it’s seen off all challengers from lumbering tech leviathans to nimble and disruptive startups.

Google will not cede its territory to a Google clone but it might one day find that its territory is not what it was.

The web is getting deeper and darker and Google, Bing and Yahoo don’t actually search most of it.

They don’t search the sites on anonymous, encrypted networks like Tor and I2P (the so-called Dark Web) and they don’t search the sites that have either asked to be ignored or that can’t be found by following links from other websites (the vast, virtual wasteland known as the Deep Web).

The big search engines don’t ignore the Deep Web because there’s some impenetrable technical barrier that prevents them from indexing it – they do it because they’re commercial entities and the costs and benefits of searching beyond their current horizons don’t stack up.

That’s fine for most of us, most of the time, but it means that there are a lot of sites that go un-indexed and lots of searches that the current crop of engines are very bad at.

That’s why the US’s Defence Advanced Research Projects Agency (DARPA) invented a search engine for the deep web called Memex.

Memex is designed to go beyond the one-size-fits-all approach of Google and deliver the domain-specific searches that are the very best solution for narrow interests.

In its first year it’s been tackling the problems of human trafficking and slavery – things that, according to DARPA, have a significant presence beyond the gaze of commercial search engines.

When we first reported on Memex in February, we knew that it would have potential far beyond that. What we didn’t know was that parts of it would become available more widely, to the likes of you and me.

A lot of the project is still somewhat murky and most of the 17 technology partners involved are still unnamed, but the plan seems to be to lift the veil, at least partially, over the next two years, starting this Friday.

That’s when an initial tranche of Memex components, including software from a team called Hyperion Gray, will be listed on DARPA’s Open Catalog.

The Hyperion Gray team described their work to Forbes as:

Advanced web crawling and scraping technologies, with a dose of Artificial Intelligence and machine learning, with the goal of being able to retrieve virtually any content on the internet in an automated way.

Eventually our system will be like an army of robot interns that can find stuff for you on the web, while you do important things like watch cat videos.

More components will follow in December and, by the time the project wraps, a “general purpose technology” will be available.

Memex and Google don’t overlap much, they solve different problems, they serve different needs and they’re funded in very different ways.

But so were Linux and Microsoft.

The tools that DARPA releases at the end of the project probably won’t be a direct competitor to Google but I expect they will be mature and better suited to certain government and business applications than Google is.

That might not matter to Google but there are three reasons why Memex might catch its eye.

The first is not news but it’s true none the less – the web is changing and so is internet use.

When Google started there was no Snapchat, Bitcoin or Facebook. Nobody cared about the Deep Web because it was hard enough to find the things you actually wanted and nobody cared about the Dark Web (remember FreeNet?) because nobody knew what it was for.

The second is this statement made by Christopher White, the man heading up the Memex team at DARPA, who’s clearly thinking big:

The problem we're trying to address is that currently access to web content is mediated by a few very large commercial search engines - Google, Microsoft Bing, Yahoo - and essentially it's a one-size fits all interface...

We've started with one domain, the human trafficking domain ... In the end we want it to be useful for any domain of interest.

That's our ambitious goal: to enable a new kind of search engine, a new way to access public web content

And the third is what we’ve just discovered – Memex isn’t just for spooks and G-Men, it’s for the rest of us to use and, more importantly, to play with.

It’s one thing to use software and quite another to be able to change it. The beauty of open source software is that people are free to take it in new directions – just like Google did when it picked up Linux and turned it into Android.

Image of torch searchlight courtesy of Shutterstock.