Pulling Rank: The Legacy of Alexa Internet

Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.

Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.
Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.

Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.

As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa Stats for Data Horde, April 28 2021

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

Webarchive for Alexa Categories page redirecting to Articles

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.

Pulling Rank: The Legacy of Alexa Internet

1 Comment

Leave a Reply to Tim Pozar Cancel reply