internet – Data Horde

Community Spotlight: Pikmin Archives

themadprogramer — Tue, 20 Feb 2024 21:30:09 +0000

Who are they?

Pikmin Archives is a group dedicated to collecting developer notes and promossional material on the Pikmin series of games.

What do they do?

The Pikmin series are RTS games where players must guide a swarm of aliens to thrive in the wild! The Pikmin games are very memorable for their unique artstyle contrasting everyday objects over sci-fi technology and fantastical nature. Celebrating that artstyle, Pikmin Archives is focussed on documenting the creative process behind the Pikmin games.

For example, Pikmin Archive member Flamsey restored the old Pikmin 2 USA website which had ceased to function due the discontinuation of Flash Player.

Got part of the Pikmin 2 USA site to work. pic.twitter.com/8LXVE1pqiq
— Flamsey (#1 Hey Pikmin Fan) (@_Flamsey) August 25, 2023

How do they do it?

Pikmin Archives is most active on their Discord server which is a hub for exchanging files and fostering discussion. There, a dedicated #archive-submissions channel is used to submit media and submissions are then curated by the mod team.

Occasionally, members might post their findings to Twitter; but there is no dedicated Pikmin Archives social media account or website at this time.

How do I sign up?

Just hop on board their Discord Server!

So what are you waiting for? Become a Pikmin Archivist, today!

Looking to discover other archiving communities? Just follow Data Horde’s Twitter List and check out our other Community Spotlights.

Without being exploited: What archivists should learn from the XeNTaX forums aftermath

themadprogramer — Thu, 16 Nov 2023 09:40:28 +0000

Some 6 months ago, in May 2023 a post was made on r/DataHoarder that the XeNTaX wiki and forum were shutting down due to financial considerations. As with any forum shutdown, much panic had ensued at that moment. However, from the few people I have spoken to about this shutdown, no one really seemed be aware of XeNTaX before this.

Depending on where you look online you may be led to believe XeNTax is/was a company, supposedly a foundation and definitely a website. Yes, that is a XeNTaX website xentax.org distinct from the XeNTaX forums forum.xentax.com. In actuality, XeNTaX has its roots in the Dutch demoscene and it has just kept reincarnating.

A Xentax song composed for the X’98 compo

XeNTaX started as a team of two, Mr. Mouse and Captain Corney, who were hacking/modding Commodore 64 games. XeNTaX grew into a much wider community over time because Mr. Mouse and Captain Corney wanted to be able to focus retrocomputing and to support others working on similar projects. For this XeNTaX developed MultiEx Commander which is a tool for unarchiving 100+ retro game formats, certainly no longer limited to C64.

On October 6, XeNTaX made a more upfront shutdown announcement[Wayback] with the shutdown being scheduled for the end-of-year. While there was still some possibility of a buyout or handover, it was unlikely. Instead, the XeNTaX community was encouraged to join the XeNTaX Discord server. Again, no surprises there: it has become fairly routine for old forums to retire to Discord which offers free hosting and a ton of features.

With this announcement, a second wave shot out. Word got out once again leading to several mass archiving efforts. However, this upset the staff enough to issue a warning on the Discord, with an emphasis on Data Privacy and consent. To quote Mr. Mouse:

Note: Members of the Xentax Forum have agreed to terms of the Forum and any public information. They have not agreed for their information being used on other sites. You may wish to look into the subject of data privacy. As such, while you’ve leeched my posts, I did not agree for those being hosted somewhere else. So remove my posts.

…

Remember to ensure approval from people before you put their stuff up that they did not agree to. In this age of data privacy and consent that is very important. As for Wayback Machine, they have a process that enables removal of pages if asked and are usually collaborative.
XeNTax Discord

This was a remarkable reaction because two things are being said here. First is the obvious point on data privacy and consent, but second is an undertone of leaching off of previous work and exploitation. The fact that the Xentax forums have shut down does not mean that the staff and contributors have quit completely. They are still around and will frown upon their work being plagerised now just as much as they would have while the forums were alive. And that is an issue most fellow archivists and hoarders have been fairly negligent of.

Amidst the archiving craze focussed on preserving the record, there was also a second preservation effort going on. An effort to preserve community. Although the XeNTaX Discord server offered a solution, many did wish for an independent forum. Even a short gofundme was run to see if maintenance costs could be crowdsourced.

The shutdown date was pulled a bit forward to November 3, 2023 as members were instructed to relocate to a new forum, Reshax, per the updated XeNTaX forum banner[Wayback]. In fact, when the forum did first shut down it began immediately redirecting to Reshax.

I’ve reached an agreement with Mr. Mouse, the owner of the Xentax forum, to promote ResHax and breathe new life into the slowly declining forum. Additionally, I’ll make an effort to bring tools from their site to ours. Once their forum becomes inactive, I’ll attempt to persuade Mr. Mouse to redirect the domain to our forum, ensuring that all users can find a new home here
Reshax admin michalss, “What about Xentax and Zenhax ?” on ResHax, Wayback Snapshot.

michalss also lamented on the recent death of the sister community Zenhax, which was abandoned due to the owner losing interest. And this could have been the end of the story, but people kept begging, asking “where are the tools, where are the assets?”…

On November 8, Xentax Discord Admin Richard Whitehouse came out with an announcement, later also shared on his homepage: Reshax and XeNTaX had reached an alternative agreement. From this point on, Reshax would be free to focus on reverse engineering however so they pleased; and XeNTaX members would be free to continue the tools and projects that they were already making. Whitehouse paints a picture of how he believes the XeNTaX community has been unfairly taken advantage of, and that this was a destructive force.

Many developers stopped sharing their findings and specifications (myself included) because they started to see their work exploited. By companies, which is morally reprehensible (and sometimes in direct violation of a given license/copyright) and serves to devalue the entire skillset associated with the labor. By other developers, who are socially positioned to exploit the labor in some other way. By people who just want to rip content to turn around and sell it, or claim false credit for it. In conjunction with unhealthy ego competition, this exploitation has made it impossible to create a culture of trust and sharing between developers.

…

We want to create an environment where developers are safe to work together without being exploited, and where developers feel valued by fellow developers enough to not feel the need to engage in pathetic ego-based assertions of skill. We want people to be fueled by their creative ambitions and technical fascinations, not their social standing. We want to create a culture beyond what Open Source can achieve under the constraints of our current socioeconomic systems. No matter how many people are left standing in the end, this is where we’re going.
Richard Whitehouse

On r/DataHoarder and other venues, the XeNTaX forum shutdown was treated as nothing more than a lost cause. There was once a XeNTaX, now there isn’t; we must therefore uphold the memory through downloading all we can. But to the alive and well XeNTaX community, these forum dumps were nothing more than an intensification of the routine stealing of their work they had grown sick of. Whitehouse’s open letter, which I have only abridged here, makes it clear what the Discord staff consider a XeNTaX contributor willing to invest time and effort to learn as opposed to internet passerbys who ask for something, take it and move on.

To further hammer in the point, Mr. Mouse issued another announcement on November 12 imploring members to not share full backups of the XeNTaX Forum on the XeNTaX Discord server. Once again, the Internet Archive and the Wayback Machine were exempted as special cases, but else it was not allowed. This however did attract some internal protest from guild members, as one might gather from the reactions to the message.

Discord Announcement on XeNTaX

This goes to show that the Internet Archive has built up enough of a reputation to not merely be heralded as leachers and pirates and that’s a good thing. Although, there is an implication here that websites just find their way onto the Internet Archive, when in fact there are automation processeces and groups like Archive Team who facilitate this. Thus we find ourselves in a Catch 22, where if something has landed on the Internet Archive it is deemed legitimate, but if it is stuck in transit it was stolen unfairly.

This is a paradox that underpins the challenge of being an archivist today: sucess means being invisible and that your archives are never widely distributed. Does that perhaps sound familiar? It’s the exact same situation the XeNTaX community finds itself in. They would rather preserve their tools and assets internally, circulating on a need-to-know basis than have it out in the open. This ensures that the community retains its knowledge, but also controls it. It’s self-determination against potential exploitation.

The XeNTaX situation is not over and hopefully it will never be over in the near future. The XeNTaX forums might be gone, but XeNTaX lives on. And I believe it sets a good example: Archivism as a hobby or profession is something which should prevail within every community, instead of the interventionist culture from 3rd parties that we have grown accustomed to today.

But that reversal we have is warranted. Many times communities do vanish or are made to vanish, whether it’s subtitlers on YouTube or artists who can no longer use Macromedia Flash. Often times, these communities do not have an obvious way of preserving their memories; the decision is out of their control and attempts at preservation necessitate challenging authority, ad hoc solutions and technical expertise (often from outside).

Whether you define yourself an archivist, a hoarder, a pirate, a cracker, an archaelogist or whatever; it is a must that you understand where the files come from. You don’t have to obey all of the wishes of the original creator, but you have to respect them. Especially if they’re still alive and kicking. The costs couldn’t kill XeNTaX, but from the looks of it archivists almost did.

Data Log 2023-10-06 Weekly News

themadprogramer — Fri, 06 Oct 2023 21:52:46 +0000

Discord CDN – FIFA delisting – Nintendo online shutdowns – Typepad – Goodboy Galaxy – Matrix

Jason Citron quote is from New York Times
Discord CDN filehosting changes rumor and announcement
EA/FIFA game delistings
Nintendo 3DS/Wii U online service shutdown
Typepad
Goodboy Galaxy Interview
Discord-Matrix Bridge: Out-of-your-element

BG: Jungle Waterfalls by Mark Ferrari.

Music: Meadow Breeze written by TECHNOTRAIN on Dova Syndome.

Data Log 2023-09-30 Intro to Phone Preservation

themadprogramer — Sun, 01 Oct 2023 21:09:44 +0000

What did you have before smartphones? Madpro and Donut talk about old phones, retro phones and retrofuturistic phones.

Podcast: https://feeds.acast.com/public/shows/data-log
Old Phone Preservation: https://twitter.com/OldPhonePreserv
More info on i-Mode: https://www.gamingalexandria.com/wp/2020/04/complete-guide-to-i-mode-games/
Emoji and the Unicode Consortium: https://home.unicode.org/emoji/about-emoji/
Snake on 7-Segment Displays, DIY project for the insane engineers among you: https://hackaday.io/project/166556-snake-on-7-segment-displays
7-segment by Vishnu Mohanan on Unsplash: https://unsplash.com/photos/9mSe-QS5JrA
3G by SoQ錫濛譙 on Flickr: https://www.flickr.com/photos/qiaomeng/5059237997
Broadcasting tower in Trondheim, Norway by on Wikimedia: https://en.wikipedia.org/wiki/Radio_broadcasting#/media/File:Tyholt_taarnet.jpg

Data Log 2023-09-17 Unity Platform Runtime Fee Controversy

themadprogramer — Sun, 17 Sep 2023 23:07:03 +0000

The Unity Engine is a popular 3D engine for making games and other interactive media. In this episode of Data Log glmdgrielson and madpro talk about how game designers and gamers are upset with the Unity platform’s new payment scheme.

David Helgason Interview: https://www.pocketgamer.biz/interview/58959/not-everyone-needs-to-be-king-or-supercell-unitys-david-helgason-on-unity-5-everyplay-and-unreal-engine/
Unity Pricing Changes Blogpost: https://blog.unity.com/news/plan-pricing-and-packaging-updates
Cult of the Lamb Steam Page Announcement: https://store.steampowered.com/news/app/1313140/view/7249162346641740420
The Chaos at Unity by Alex Heath: https://www.theverge.com/2023/9/15/23875408/command-line-the-chaos-at-unity
BlueMaxima’s Flashpoint (Home to some Web Unity Games): https://flashpointarchive.org/
Fundraiser for Donkey Kong Programmers Story: https://hitsave.org/fundraiser-translation-of-donkey-kong-programmers-story/

Data Log 2023-01-26 What is Archiving?

themadprogramer — Thu, 26 Jan 2023 23:37:35 +0000

The first ever episode of Data Log: The Archiver’s Favorite Podcast. Learn about what archiving is and how to join the archiving community!

Ruffle: https://ruffle.rs/
Public Domain Day 2023 Film Contest: https://blog.archive.org/2023/01/21/public-domain-day-film-contest-highlights-works-of-1927/
Internet Archive: https://archive.org/
Internet Archive Twitter: https://twitter.com/internetarchive
Internet Archive Mastodon: https://mastodon.archive.org/users/internetarchive
Jason Scott: https://twitter.com/textfiles
Center for MI Jewish Heritage: https://www.tiktok.com/@mijewishheritage

Twitter in Trouble? Why you should Archive your Tweets

themadprogramer — Mon, 05 Dec 2022 17:04:49 +0000

Twitter has seen some radical restructuring since Elon Musk’s acquisition over a month ago. Now is a good time as ever, that we talked about what options you have in archiving or preserving your Twitter content.

This new era of Twitter has been quite turbulent, to say the least. More than half of the workforce has been fired or has quit, and site functionality is becoming unstable, as reported by the Seattle Times. Mastodon has emerged as a serious Twitter alternative. In fact, some of those who have departed Twitter now have their own Mastodon instance over at macaw.social. Personally, I am excited about the rise of mastodon as an alternative as I have been posting Data Horde updates over at @[email protected] for about two years now.

So, why not leave Twitter behind and move on? Now, Twitter allows you to request a copy of your personal data: Tweets and all. But it’s probably hard to leave a site that you have been on for over a decade. Especially, when requesting your personal archive is not even working correctly. Many people have reported that archive requests are being ignored or processed with delay. On a test account, we at Data Horde found that it took over 3 days to receive a personal archive.

Tweeters complaining about being unable to export personal archives: view snapshot at archive.is

In 2022 this is a big deal, not only for archivists but also for legality. Article 13 of the GDPR mandates a responsibility to provide a copy of collected data to users (i.e. data subjects) upon request. Outside of Europe, California’s CCPA has a similar clause protecting the right to know.

There are repercussion for not respecting these rules. Recently another messaging app, Discord, was fined 800 000 Euros for failing to respect data retention periods and security of personal data by French Regulator CNIL. That was actually a reduced fine, given Discord’s conciliatory attitude. If Twitter does not up their game, they may meet a similar fate, if not a worser one.

Now that I have your attention, I would like to direct it to the help page on how to request a personal archive from Twitter: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive . Even if a bit unstable, this is what you need to follow to save a copy of your Tweets.

The Twitter archive is big and burly but not perfect. Johan van der Knijff recently wrote a blogpost on some shortcomings, such as the t.co URL-shortener and some workarounds: https://www.bitsgalore.org/2022/11/20/how-to-preserve-your-personal-twitter-archive

Oh, and by the way. It gets worse: Elon Musk has also stated interest in purging inactive accounts and their Tweet history.

Definitely
— Elon Musk (@elonmusk) November 1, 2022

Archive Snapshot: https://archive.ph/hcKsV

This might not seem like a big deal, except to the one or two of our readers who periodically scrape politician accounts off of https://ballotpedia.org. Yet it is actually a serious turning point. Currently, Twitter does not purge inactive accounts, except in the event of death or incapacitation and by special request.

In 2019 there was an attempted Twitter policy change to expire accounts which had not been logged into for 6 months. This sparked outrage across the platform by those who saw this as unfair to the memory of inactive accounts. In particular, fans of deceased K-Pop artist Kim Jong-hyun, otherwise known as Jonghyun (김종현/종현) came to the defence of his legacy overturning the attempt altogether. Turning back on this decision would go against all of that heritage, people’s heritage, Twitter’s heritage, web heritage. Alas this the projected course of things, even if we cannot prevent it, it is perhaps our duty to protest why it is wrong.

What about the extreme scenario of a total collapse of Twitter? What does that mean for web history? Well, the good new is that people have been thinking on this for much longer than before this year.

Already in 2010 the Library of Congress announced that they would be copying the entire internal archive of Twitter, starting from March 2006.

Library to acquire ENTIRE Twitter archive — ALL public tweets, ever, since March 2006! Details to follow.
— Library of Congress (@librarycongress) April 14, 2010

Archive Snapshot: https://web.archive.org/web/20161208074132/https://twitter.com/librarycongress/statuses/12169442690

There are also many smaller grabs on the Internet Archive and archive.today, some of which you have seen linked above. Special mention goes to Archive Team‘s periodical Twitter Stream archive.

Last but not least, you can help! The Internet Archive is collecting Tweet dumps from people as we speak: https://archive.org/services/wayback-gsheets/archive-your-tweets Whether you just want extra insurance for your back-up, or to contribute to the wealth of the web you can help by using the above tool to upload your Tweets to the Internet Archive for generations to come.

Archive95: The Old Man Internet

glmdgrielson — Thu, 21 Jul 2022 23:28:13 +0000

The internet is kind of old. To be fair, so is the Internet Archive and its Wayback Machine. But IA isn’t older than the internet (how could it be?) so there are some things that could slip through the cracks. Things before its founding in 1996, for example.

Then comes along Archive95 which is an archive of the pre-IA internet of 1995. It primarily uses two sources, the World Wide Web Directory and the German language Einblick ins Internet, to give an impression of an era when the web was small and monitors were bulky as heck.

– glmdgrielson, a young whippersnapper

Remembering YouTube’s Lost Unlisted Videos

themadprogramer — Thu, 12 May 2022 22:55:50 +0000

Melinda teaches high school in the Bay Area and recently reached out to us with a problem. Her students just finished a video history project that she wanted to share with their parents and classmates. But she was concerned about posting the videos publicly because she didn’t want the whole world to find them (frankly, neither did her students). Melinda told us YouTube’s private sharing options — a 25-person cap that’s limited to other YouTube users — didn’t work for her. She needed a better option to privately share her students’ talent.
Later today, we’ll be rolling out a new choice that will help Melinda and other people like her: unlisted videos.
Jen Chen, Software Engineer at Google, https://blog.youtube/news-and-events/more-choice-for-users-unlisted-videos/

On this day, 12 years ago, YouTube introduced unlisted videos as a compromise between a public and a private video. Perfect for sharing your history project with friends, video outtakes, or just about anything you didn’t want cluttering your channel.

Some time later, a non-targetted exploit was discovered which could reveal the links of existing YouTube videos, but not the content itself. So in 2017, YouTube changed how links were generated to make links more unpredictable. It could have ended there, but it didn’t.

YouTube will Private Old Unlisted Videos Next Month

Years later in 2021, YouTube decided that having their links be hypothetically predictable, might be problematic for old unlisted videos. So they decided to haphazardly automatically private old unlisted videos, uploaded prior to 2017.

Users were offered an option to opt-out, if their channels were still active AND they acted within a month of the announcement. Unfortunately millions of videos were lost in the name of security. Vlogs, school projects, outtakes, patreon videos; things people wanted to share BUT they didn’t private.

Is there any silver lining to all of this? Not all is lost. There are collections like filmot which offer a non-invasive database of metadata on these unlisted videos, minus the videos themselves. There was also a project by Archive Team to archive a few TBs of unlisted videos, even if only a small sample. More than anything, YouTubers have been uploading re-uploads, in the case of inactive channels and/or significant unlisted videos.

Not to sound like a beggar, but we would really appreciate it if you could share this short blog post. Almost one year later this situation has still not become common knowledge. Also be sure to check out our unlisted video countdown from last year:

2/5 Thank you for sticking with us these past 30 days, here's a complete playlist if you'd like to watch all the videos on YouTube while you still can.

https://t.co/pXyzDl9BfD

Note that you should go to https://t.co/BLlMJfP9gb for videos with annotations.
— Data Horde (@DataHordeBlog) July 22, 2021

Pulling Rank: The Legacy of Alexa Internet

themadprogramer — Fri, 29 Apr 2022 17:25:26 +0000

Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.

Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

Alexa Internet Logo (1997)

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.
Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.

Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.

As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa Stats for Data Horde, April 28 2021

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

Webarchive for Alexa Categories page redirecting to Articles

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.