archiving – Data Horde https://datahorde.org Join the Horde! Thu, 16 Nov 2023 09:41:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png archiving – Data Horde https://datahorde.org 32 32 Without being exploited: What archivists should learn from the XeNTaX forums aftermath https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/ https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/#comments Thu, 16 Nov 2023 09:40:28 +0000 https://datahorde.org/?p=2923 Some 6 months ago, in May 2023 a post was made on r/DataHoarder that the XeNTaX wiki and forum were shutting down due to financial considerations. As with any forum shutdown, much panic had ensued at that moment. However, from the few people I have spoken to about this shutdown, no one really seemed be aware of XeNTaX before this.

Depending on where you look online you may be led to believe XeNTax is/was a company, supposedly a foundation and definitely a website. Yes, that is a XeNTaX website xentax.org distinct from the XeNTaX forums forum.xentax.com. In actuality, XeNTaX has its roots in the Dutch demoscene and it has just kept reincarnating.

A Xentax song composed for the X’98 compo

XeNTaX started as a team of two, Mr. Mouse and Captain Corney, who were hacking/modding Commodore 64 games. XeNTaX grew into a much wider community over time because Mr. Mouse and Captain Corney wanted to be able to focus retrocomputing and to support others working on similar projects. For this XeNTaX developed MultiEx Commander which is a tool for unarchiving 100+ retro game formats, certainly no longer limited to C64.


On October 6, XeNTaX made a more upfront shutdown announcement[Wayback] with the shutdown being scheduled for the end-of-year. While there was still some possibility of a buyout or handover, it was unlikely. Instead, the XeNTaX community was encouraged to join the XeNTaX Discord server. Again, no surprises there: it has become fairly routine for old forums to retire to Discord which offers free hosting and a ton of features.

With this announcement, a second wave shot out. Word got out once again leading to several mass archiving efforts. However, this upset the staff enough to issue a warning on the Discord, with an emphasis on Data Privacy and consent. To quote Mr. Mouse:

Note: Members of the Xentax Forum have agreed to terms of the Forum and any public information. They have not agreed for their information being used on other sites. You may wish to look into the subject of data privacy. As such, while you’ve leeched my posts, I did not agree for those being hosted somewhere else. So remove my posts.

Remember to ensure approval from people before you put their stuff up that they did not agree to. In this age of data privacy and consent that is very important. As for Wayback Machine, they have a process that enables removal of pages if asked and are usually collaborative.

XeNTax Discord

This was a remarkable reaction because two things are being said here. First is the obvious point on data privacy and consent, but second is an undertone of leaching off of previous work and exploitation. The fact that the Xentax forums have shut down does not mean that the staff and contributors have quit completely. They are still around and will frown upon their work being plagerised now just as much as they would have while the forums were alive. And that is an issue most fellow archivists and hoarders have been fairly negligent of.


Amidst the archiving craze focussed on preserving the record, there was also a second preservation effort going on. An effort to preserve community. Although the XeNTaX Discord server offered a solution, many did wish for an independent forum. Even a short gofundme was run to see if maintenance costs could be crowdsourced.

The shutdown date was pulled a bit forward to November 3, 2023 as members were instructed to relocate to a new forum, Reshax, per the updated XeNTaX forum banner[Wayback]. In fact, when the forum did first shut down it began immediately redirecting to Reshax.

I’ve reached an agreement with Mr. Mouse, the owner of the Xentax forum, to promote ResHax and breathe new life into the slowly declining forum. Additionally, I’ll make an effort to bring tools from their site to ours. Once their forum becomes inactive, I’ll attempt to persuade Mr. Mouse to redirect the domain to our forum, ensuring that all users can find a new home here

Reshax admin michalss, “What about Xentax and Zenhax ?” on ResHax, Wayback Snapshot.

michalss also lamented on the recent death of the sister community Zenhax, which was abandoned due to the owner losing interest. And this could have been the end of the story, but people kept begging, asking “where are the tools, where are the assets?”…


On November 8, Xentax Discord Admin Richard Whitehouse came out with an announcement, later also shared on his homepage: Reshax and XeNTaX had reached an alternative agreement. From this point on, Reshax would be free to focus on reverse engineering however so they pleased; and XeNTaX members would be free to continue the tools and projects that they were already making. Whitehouse paints a picture of how he believes the XeNTaX community has been unfairly taken advantage of, and that this was a destructive force.

Many developers stopped sharing their findings and specifications (myself included) because they started to see their work exploited. By companies, which is morally reprehensible (and sometimes in direct violation of a given license/copyright) and serves to devalue the entire skillset associated with the labor. By other developers, who are socially positioned to exploit the labor in some other way. By people who just want to rip content to turn around and sell it, or claim false credit for it. In conjunction with unhealthy ego competition, this exploitation has made it impossible to create a culture of trust and sharing between developers.

We want to create an environment where developers are safe to work together without being exploited, and where developers feel valued by fellow developers enough to not feel the need to engage in pathetic ego-based assertions of skill. We want people to be fueled by their creative ambitions and technical fascinations, not their social standing. We want to create a culture beyond what Open Source can achieve under the constraints of our current socioeconomic systems. No matter how many people are left standing in the end, this is where we’re going.

Richard Whitehouse

On r/DataHoarder and other venues, the XeNTaX forum shutdown was treated as nothing more than a lost cause. There was once a XeNTaX, now there isn’t; we must therefore uphold the memory through downloading all we can. But to the alive and well XeNTaX community, these forum dumps were nothing more than an intensification of the routine stealing of their work they had grown sick of. Whitehouse’s open letter, which I have only abridged here, makes it clear what the Discord staff consider a XeNTaX contributor willing to invest time and effort to learn as opposed to internet passerbys who ask for something, take it and move on.

To further hammer in the point, Mr. Mouse issued another announcement on November 12 imploring members to not share full backups of the XeNTaX Forum on the XeNTaX Discord server. Once again, the Internet Archive and the Wayback Machine were exempted as special cases, but else it was not allowed. This however did attract some internal protest from guild members, as one might gather from the reactions to the message.

This goes to show that the Internet Archive has built up enough of a reputation to not merely be heralded as leachers and pirates and that’s a good thing. Although, there is an implication here that websites just find their way onto the Internet Archive, when in fact there are automation processeces and groups like Archive Team who facilitate this. Thus we find ourselves in a Catch 22, where if something has landed on the Internet Archive it is deemed legitimate, but if it is stuck in transit it was stolen unfairly.

This is a paradox that underpins the challenge of being an archivist today: sucess means being invisible and that your archives are never widely distributed. Does that perhaps sound familiar? It’s the exact same situation the XeNTaX community finds itself in. They would rather preserve their tools and assets internally, circulating on a need-to-know basis than have it out in the open. This ensures that the community retains its knowledge, but also controls it. It’s self-determination against potential exploitation.


The XeNTaX situation is not over and hopefully it will never be over in the near future. The XeNTaX forums might be gone, but XeNTaX lives on. And I believe it sets a good example: Archivism as a hobby or profession is something which should prevail within every community, instead of the interventionist culture from 3rd parties that we have grown accustomed to today.

But that reversal we have is warranted. Many times communities do vanish or are made to vanish, whether it’s subtitlers on YouTube or artists who can no longer use Macromedia Flash. Often times, these communities do not have an obvious way of preserving their memories; the decision is out of their control and attempts at preservation necessitate challenging authority, ad hoc solutions and technical expertise (often from outside).

Whether you define yourself an archivist, a hoarder, a pirate, a cracker, an archaelogist or whatever; it is a must that you understand where the files come from. You don’t have to obey all of the wishes of the original creator, but you have to respect them. Especially if they’re still alive and kicking. The costs couldn’t kill XeNTaX, but from the looks of it archivists almost did.

]]>
https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/feed/ 2
Data Log 2023-10-06 Weekly News https://datahorde.org/data-log-2023-10-06-weekly-news/ https://datahorde.org/data-log-2023-10-06-weekly-news/#comments Fri, 06 Oct 2023 21:52:46 +0000 https://datahorde.org/?p=2915 Discord CDN – FIFA delisting – Nintendo online shutdowns – Typepad – Goodboy Galaxy – Matrix

BG: Jungle Waterfalls by Mark Ferrari.

Music: Meadow Breeze written by TECHNOTRAIN on Dova Syndome.

]]>
https://datahorde.org/data-log-2023-10-06-weekly-news/feed/ 1
Data Log 2023-09-30 Intro to Phone Preservation https://datahorde.org/data-log-2023-09-30-intro-to-phone-preservation/ https://datahorde.org/data-log-2023-09-30-intro-to-phone-preservation/#respond Sun, 01 Oct 2023 21:09:44 +0000 https://datahorde.org/?p=2908 What did you have before smartphones? Madpro and Donut talk about old phones, retro phones and retrofuturistic phones.

]]>
https://datahorde.org/data-log-2023-09-30-intro-to-phone-preservation/feed/ 0
Data Log 2023-09-17 Unity Platform Runtime Fee Controversy https://datahorde.org/data-log-2023-09-17-unity-fee-controversy/ https://datahorde.org/data-log-2023-09-17-unity-fee-controversy/#respond Sun, 17 Sep 2023 23:07:03 +0000 https://datahorde.org/?p=2901 The Unity Engine is a popular 3D engine for making games and other interactive media. In this episode of Data Log glmdgrielson and madpro talk about how game designers and gamers are upset with the Unity platform’s new payment scheme.

]]>
https://datahorde.org/data-log-2023-09-17-unity-fee-controversy/feed/ 0
Data Log 2023-01-26 What is Archiving? https://datahorde.org/2023-01-26-what-is-archiving/ https://datahorde.org/2023-01-26-what-is-archiving/#respond Thu, 26 Jan 2023 23:37:35 +0000 https://datahorde.org/?p=2876 The first ever episode of Data Log: The Archiver’s Favorite Podcast. Learn about what archiving is and how to join the archiving community!

]]>
https://datahorde.org/2023-01-26-what-is-archiving/feed/ 0
Twitter in Trouble? Why you should Archive your Tweets https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/ https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/#comments Mon, 05 Dec 2022 17:04:49 +0000 https://datahorde.org/?p=2852 Twitter has seen some radical restructuring since Elon Musk’s acquisition over a month ago. Now is a good time as ever, that we talked about what options you have in archiving or preserving your Twitter content.


This new era of Twitter has been quite turbulent, to say the least. More than half of the workforce has been fired or has quit, and site functionality is becoming unstable, as reported by the Seattle Times. Mastodon has emerged as a serious Twitter alternative. In fact, some of those who have departed Twitter now have their own Mastodon instance over at macaw.social. Personally, I am excited about the rise of mastodon as an alternative as I have been posting Data Horde updates over at @[email protected] for about two years now.

So, why not leave Twitter behind and move on? Now, Twitter allows you to request a copy of your personal data: Tweets and all. But it’s probably hard to leave a site that you have been on for over a decade. Especially, when requesting your personal archive is not even working correctly. Many people have reported that archive requests are being ignored or processed with delay. On a test account, we at Data Horde found that it took over 3 days to receive a personal archive.

Tweeters complaining about being unable to export personal archives: view snapshot at archive.is

In 2022 this is a big deal, not only for archivists but also for legality. Article 13 of the GDPR mandates a responsibility to provide a copy of collected data to users (i.e. data subjects) upon request. Outside of Europe, California’s CCPA has a similar clause protecting the right to know.

There are repercussion for not respecting these rules. Recently another messaging app, Discord, was fined 800 000 Euros for failing to respect data retention periods and security of personal data by French Regulator CNIL. That was actually a reduced fine, given Discord’s conciliatory attitude. If Twitter does not up their game, they may meet a similar fate, if not a worser one.

Now that I have your attention, I would like to direct it to the help page on how to request a personal archive from Twitter: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive . Even if a bit unstable, this is what you need to follow to save a copy of your Tweets.

The Twitter archive is big and burly but not perfect. Johan van der Knijff recently wrote a blogpost on some shortcomings, such as the t.co URL-shortener and some workarounds: https://www.bitsgalore.org/2022/11/20/how-to-preserve-your-personal-twitter-archive


Oh, and by the way. It gets worse: Elon Musk has also stated interest in purging inactive accounts and their Tweet history.

Archive Snapshot: https://archive.ph/hcKsV

This might not seem like a big deal, except to the one or two of our readers who periodically scrape politician accounts off of https://ballotpedia.org. Yet it is actually a serious turning point. Currently, Twitter does not purge inactive accounts, except in the event of death or incapacitation and by special request.

In 2019 there was an attempted Twitter policy change to expire accounts which had not been logged into for 6 months. This sparked outrage across the platform by those who saw this as unfair to the memory of inactive accounts. In particular, fans of deceased K-Pop artist Kim Jong-hyun, otherwise known as Jonghyun (김종현/종현) came to the defence of his legacy overturning the attempt altogether. Turning back on this decision would go against all of that heritage, people’s heritage, Twitter’s heritage, web heritage. Alas this the projected course of things, even if we cannot prevent it, it is perhaps our duty to protest why it is wrong.


What about the extreme scenario of a total collapse of Twitter? What does that mean for web history? Well, the good new is that people have been thinking on this for much longer than before this year.

Already in 2010 the Library of Congress announced that they would be copying the entire internal archive of Twitter, starting from March 2006.

Archive Snapshot: https://web.archive.org/web/20161208074132/https://twitter.com/librarycongress/statuses/12169442690

There are also many smaller grabs on the Internet Archive and archive.today, some of which you have seen linked above. Special mention goes to Archive Team‘s periodical Twitter Stream archive.

Last but not least, you can help! The Internet Archive is collecting Tweet dumps from people as we speak: https://archive.org/services/wayback-gsheets/archive-your-tweets Whether you just want extra insurance for your back-up, or to contribute to the wealth of the web you can help by using the above tool to upload your Tweets to the Internet Archive for generations to come.

]]>
https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/feed/ 1
Archive95: The Old Man Internet https://datahorde.org/archive95-the-old-man-internet/ https://datahorde.org/archive95-the-old-man-internet/#respond Thu, 21 Jul 2022 23:28:13 +0000 https://datahorde.org/?p=2811 The internet is kind of old. To be fair, so is the Internet Archive and its Wayback Machine. But IA isn’t older than the internet (how could it be?) so there are some things that could slip through the cracks. Things before its founding in 1996, for example.

Then comes along Archive95 which is an archive of the pre-IA internet of 1995. It primarily uses two sources, the World Wide Web Directory and the German language Einblick ins Internet, to give an impression of an era when the web was small and monitors were bulky as heck.

– glmdgrielson, a young whippersnapper

]]>
https://datahorde.org/archive95-the-old-man-internet/feed/ 0
Remembering YouTube’s Lost Unlisted Videos https://datahorde.org/remembering-youtubes-lost-unlisted-videos/ https://datahorde.org/remembering-youtubes-lost-unlisted-videos/#comments Thu, 12 May 2022 22:55:50 +0000 https://datahorde.org/?p=2799

Melinda teaches high school in the Bay Area and recently reached out to us with a problem. Her students just finished a video history project that she wanted to share with their parents and classmates. But she was concerned about posting the videos publicly because she didn’t want the whole world to find them (frankly, neither did her students). Melinda told us YouTube’s private sharing options — a 25-person cap that’s limited to other YouTube users — didn’t work for her. She needed a better option to privately share her students’ talent.

Later today, we’ll be rolling out a new choice that will help Melinda and other people like her: unlisted videos.

Jen Chen, Software Engineer at Google, https://blog.youtube/news-and-events/more-choice-for-users-unlisted-videos/

On this day, 12 years ago, YouTube introduced unlisted videos as a compromise between a public and a private video. Perfect for sharing your history project with friends, video outtakes, or just about anything you didn’t want cluttering your channel.

Some time later, a non-targetted exploit was discovered which could reveal the links of existing YouTube videos, but not the content itself. So in 2017, YouTube changed how links were generated to make links more unpredictable. It could have ended there, but it didn’t.

Years later in 2021, YouTube decided that having their links be hypothetically predictable, might be problematic for old unlisted videos. So they decided to haphazardly automatically private old unlisted videos, uploaded prior to 2017.

Users were offered an option to opt-out, if their channels were still active AND they acted within a month of the announcement. Unfortunately millions of videos were lost in the name of security. Vlogs, school projects, outtakes, patreon videos; things people wanted to share BUT they didn’t private.

Is there any silver lining to all of this? Not all is lost. There are collections like filmot which offer a non-invasive database of metadata on these unlisted videos, minus the videos themselves. There was also a project by Archive Team to archive a few TBs of unlisted videos, even if only a small sample. More than anything, YouTubers have been uploading re-uploads, in the case of inactive channels and/or significant unlisted videos.

Image

Not to sound like a beggar, but we would really appreciate it if you could share this short blog post. Almost one year later this situation has still not become common knowledge. Also be sure to check out our unlisted video countdown from last year:

]]>
https://datahorde.org/remembering-youtubes-lost-unlisted-videos/feed/ 2
Pulling Rank: The Legacy of Alexa Internet https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/ https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/#comments Fri, 29 Apr 2022 17:25:26 +0000 https://datahorde.org/?p=2772 Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.


Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

Alexa Internet Logo (1997)

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.

Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.
Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.


As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.

]]>
https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/feed/ 1
Interview with Hubz of Gaming Alexandria https://datahorde.org/interview-with-hubz-of-gaming-alexandria/ https://datahorde.org/interview-with-hubz-of-gaming-alexandria/#respond Mon, 18 Apr 2022 09:09:30 +0000 https://datahorde.org/?p=2719 Hello, here’s another interview, this time with our head overlord Hubz of Gaming Alexandria.

glmdgrielson: So, first question, what is Gaming Alexandria?
Hubz: At it’s core it’s both a Discord community and a separate website dedicated to preserving various aspects of video games, such as scans, interviews, unreleased games, youtube videos etc. It mainly started as a site where I could share high quality scans but has grown thanks to many people joining up with various skills to help expand the website. The Discord community itself is really an entity unto itself at this point where lots of gaming historians/preservationists have come together to share their works and also help each other out when needed with various projects. I love getting to see all the passion in everybody’s projects that they put forth and the willingness of the community to offer help when asked.

g: Tell me more about this community. I’m active in the server, but what does it look like from your end?
H: From an admin standpoint I have access to all the channels which include the private #staff and #mods channels where we discuss upcoming articles or projects for the site as well as handling the occasional argument or bad apple in the chat. Dylan Mansfeld (DillyDylan) handles a lot of great articles on undumped/prototype games that were previously unreleased. Ethan Johnson writes for his own blog (https://thehistoryofhowweplay.wordpress.com/) and Gaming Alexandria at times and is our editor so he glances through and cleans up all the articles that get posted. Jonas Rosland who is the Executive Director of the NPO, I’m a board member of, called Hit Save (https://hitsave.org/) does a lot of thankless technical work behind the scenes that includes a NAS he has setup for not only the staff of the website to store project files but the community at large which is a huge help. Wietse van Bruggen (Densy) handles a lot of the moderation of the chat and has been a huge help keeping the Discord community friendly and clean with his balanced moderation style. Last but not least there is Stefan Gancer (Gazimaluke) who did the original site redesign and has been a great idea man for ways to improve the site and community as time has gone on. For me personally I try to keep up with all the chat in the channels (though it can be tough at times!) just to have an idea of what’s going on and seeing what I can help with or connect people to further projects as well as post my scans and projects as they’re completed. Thanks to the rest of the staff I rarely have to step in and moderate which is very nice!

g: I’m going to skip over the omission of Norm and ask about the history of how the site has evolved.
H: LOL yes Norm is a menace to society and must be stopped.

Editor’s note: Hubz has a mock rivalry with Norm, a.k.a. the Gaming Historian and is a frequent running gag on the server. I do not believe there is actual malice.

The website itself started officially on October 23rd, 2015 and was just a basic text website that I could easily upload to in order to share my scans, it was very barebones. The reason I wanted to get high quality scans out was due to using an emulator frontend called Hyperspin. For popular systems it had a lot of decent quality artwork for boxes. But for lesser known systems it was sorely lacking and that triggered my OCD and made be realize that scanning stuff in high resolution was something that needed to be done. Slowly, but surely, I met others that wanted to scan in high quality and have their stuff hosted and they would submit stuff such as Densy. At some point I got involved with the VGPC discord and met Kirkland who had been quietly doing something similar with his collection and collaborated with him and others on establishing scanning standards to use going forward to have some level of consistent quality with those that were willing to do it which eventually led to what is the https://scanning.guide/. In late 2018 the site was graciously redone by Gazimaluke and relaunched in the design you see now. We started branching out into actual articles written by our staff and releasing prototypes and unreleased games that we came across. The site continues doing this to this day, though we are branching out into more guest authors from the community posting interviews and articles as well in the near future.

g: As well as hosting my site, for which I am grateful for. So, what is the day to day like for you?
H: Day to day on the scanning I try to get at least one magazine done daily. Doesn’t always happen but, in general, I debind a magazine the night before, then in the morning scan it in before leaving for work. If work gets slow I work on processing the scans, or else I’ll do it later that night and get them uploaded to the site and the Internet Archive.

g: Interesting. So how big do you think your archive is by this point?
H: Archive upload-wise I’m probably right around 2900 items if you count stuff that was removed lol. Then there’s a bunch on the site that wasn’t done to the higher scanning standards I go by now that’s not on the archive. So I’d guess in the 3000-4000 item range currently.

g: Do you know how big it is in terms of filesize?
H: Let me see real quick…
Looks like 2.5TB which is another reason I’m so thankful to have the Internet Archive to host my scans on due to the space and bandwidth that would be required otherwise.
The site alone usually has about half a TB of traffic per month so I can only imagine what it would be like if the magazine scans were also hosted directly on it.

g: Neat. Is there anything interesting that you got to be a part of due to GA that you would like to share?
H: Biggest thing is probably working with The Video Game History Foundation on scanning their extensive magazine collection so digital copies can be provided along with physical copies at their library. Being able to leverage the Internet Archive so people all over the world can easily access the magazines I’ve scanned that they might not have been able to easily otherwise is a great feeling personally for me. So many of these things are quite difficult to acquire and expensive as time goes on so having them as an ally in the preservation world is a godsend. There’s been lots of other connections and other projects I’ve worked on as well but I won’t ramble forever on that. Not only is Gaming Alexandria a tight community that likes to help each other out but there’s plenty of other preservation groups like VGHF, TCRF, and Hidden Palace just to name a few and we all get along great and try to push preservation forward together.
There’s so much work that needs to be done that we need all the help we can get and we need to support each other any way we can I think.

g: True that. Last question for now: anything that you would recommend to a would-be archivist?
H: I think it’s a good idea to preserve what interests you, which seems to go without saying, but I mean it more from a sense of not only going after what is popular. While you might not get much fanfare initially for the more obscure stuff it’s likely you’ll be the only one doing it and it’s important it’s being done. If you do good work for long enough it will get noticed, and to make good work easier it’s best to go with what you’re passionate about. The other thing I would suggest is not beating yourself up or comparing your output to others. Do what you can when you want to, this is a hobby after all. If you make yourself miserable trying to do something your output will naturally suffer or you might even burn out and stop altogether. Like I said before, we need all the help we can get, so try to avoid that if at all possible.

g: Thank you for being here, overlord Hubz. It’s been good talking to you.
H: No problem! Thaks for the interview. 🙂

– glmdgrielson, being a very good minion interviewer

]]>
https://datahorde.org/interview-with-hubz-of-gaming-alexandria/feed/ 0