Editorial – Data Horde https://datahorde.org Join the Horde! Thu, 16 Nov 2023 09:41:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.6 https://i0.wp.com/datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small.png?fit=32%2C32&ssl=1 Editorial – Data Horde https://datahorde.org 32 32 174412562 Without being exploited: What archivists should learn from the XeNTaX forums aftermath https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/ https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/#comments Thu, 16 Nov 2023 09:40:28 +0000 https://datahorde.org/?p=2923 Some 6 months ago, in May 2023 a post was made on r/DataHoarder that the XeNTaX wiki and forum were shutting down due to financial considerations. As with any forum shutdown, much panic had ensued at that moment. However, from the few people I have spoken to about this shutdown, no one really seemed be aware of XeNTaX before this.

Depending on where you look online you may be led to believe XeNTax is/was a company, supposedly a foundation and definitely a website. Yes, that is a XeNTaX website xentax.org distinct from the XeNTaX forums forum.xentax.com. In actuality, XeNTaX has its roots in the Dutch demoscene and it has just kept reincarnating.

A Xentax song composed for the X’98 compo

XeNTaX started as a team of two, Mr. Mouse and Captain Corney, who were hacking/modding Commodore 64 games. XeNTaX grew into a much wider community over time because Mr. Mouse and Captain Corney wanted to be able to focus retrocomputing and to support others working on similar projects. For this XeNTaX developed MultiEx Commander which is a tool for unarchiving 100+ retro game formats, certainly no longer limited to C64.


On October 6, XeNTaX made a more upfront shutdown announcement[Wayback] with the shutdown being scheduled for the end-of-year. While there was still some possibility of a buyout or handover, it was unlikely. Instead, the XeNTaX community was encouraged to join the XeNTaX Discord server. Again, no surprises there: it has become fairly routine for old forums to retire to Discord which offers free hosting and a ton of features.

With this announcement, a second wave shot out. Word got out once again leading to several mass archiving efforts. However, this upset the staff enough to issue a warning on the Discord, with an emphasis on Data Privacy and consent. To quote Mr. Mouse:

Note: Members of the Xentax Forum have agreed to terms of the Forum and any public information. They have not agreed for their information being used on other sites. You may wish to look into the subject of data privacy. As such, while you’ve leeched my posts, I did not agree for those being hosted somewhere else. So remove my posts.

Remember to ensure approval from people before you put their stuff up that they did not agree to. In this age of data privacy and consent that is very important. As for Wayback Machine, they have a process that enables removal of pages if asked and are usually collaborative.

XeNTax Discord

This was a remarkable reaction because two things are being said here. First is the obvious point on data privacy and consent, but second is an undertone of leaching off of previous work and exploitation. The fact that the Xentax forums have shut down does not mean that the staff and contributors have quit completely. They are still around and will frown upon their work being plagerised now just as much as they would have while the forums were alive. And that is an issue most fellow archivists and hoarders have been fairly negligent of.


Amidst the archiving craze focussed on preserving the record, there was also a second preservation effort going on. An effort to preserve community. Although the XeNTaX Discord server offered a solution, many did wish for an independent forum. Even a short gofundme was run to see if maintenance costs could be crowdsourced.

The shutdown date was pulled a bit forward to November 3, 2023 as members were instructed to relocate to a new forum, Reshax, per the updated XeNTaX forum banner[Wayback]. In fact, when the forum did first shut down it began immediately redirecting to Reshax.

I’ve reached an agreement with Mr. Mouse, the owner of the Xentax forum, to promote ResHax and breathe new life into the slowly declining forum. Additionally, I’ll make an effort to bring tools from their site to ours. Once their forum becomes inactive, I’ll attempt to persuade Mr. Mouse to redirect the domain to our forum, ensuring that all users can find a new home here

Reshax admin michalss, “What about Xentax and Zenhax ?” on ResHax, Wayback Snapshot.

michalss also lamented on the recent death of the sister community Zenhax, which was abandoned due to the owner losing interest. And this could have been the end of the story, but people kept begging, asking “where are the tools, where are the assets?”…


On November 8, Xentax Discord Admin Richard Whitehouse came out with an announcement, later also shared on his homepage: Reshax and XeNTaX had reached an alternative agreement. From this point on, Reshax would be free to focus on reverse engineering however so they pleased; and XeNTaX members would be free to continue the tools and projects that they were already making. Whitehouse paints a picture of how he believes the XeNTaX community has been unfairly taken advantage of, and that this was a destructive force.

Many developers stopped sharing their findings and specifications (myself included) because they started to see their work exploited. By companies, which is morally reprehensible (and sometimes in direct violation of a given license/copyright) and serves to devalue the entire skillset associated with the labor. By other developers, who are socially positioned to exploit the labor in some other way. By people who just want to rip content to turn around and sell it, or claim false credit for it. In conjunction with unhealthy ego competition, this exploitation has made it impossible to create a culture of trust and sharing between developers.

We want to create an environment where developers are safe to work together without being exploited, and where developers feel valued by fellow developers enough to not feel the need to engage in pathetic ego-based assertions of skill. We want people to be fueled by their creative ambitions and technical fascinations, not their social standing. We want to create a culture beyond what Open Source can achieve under the constraints of our current socioeconomic systems. No matter how many people are left standing in the end, this is where we’re going.

Richard Whitehouse

On r/DataHoarder and other venues, the XeNTaX forum shutdown was treated as nothing more than a lost cause. There was once a XeNTaX, now there isn’t; we must therefore uphold the memory through downloading all we can. But to the alive and well XeNTaX community, these forum dumps were nothing more than an intensification of the routine stealing of their work they had grown sick of. Whitehouse’s open letter, which I have only abridged here, makes it clear what the Discord staff consider a XeNTaX contributor willing to invest time and effort to learn as opposed to internet passerbys who ask for something, take it and move on.

To further hammer in the point, Mr. Mouse issued another announcement on November 12 imploring members to not share full backups of the XeNTaX Forum on the XeNTaX Discord server. Once again, the Internet Archive and the Wayback Machine were exempted as special cases, but else it was not allowed. This however did attract some internal protest from guild members, as one might gather from the reactions to the message.

This goes to show that the Internet Archive has built up enough of a reputation to not merely be heralded as leachers and pirates and that’s a good thing. Although, there is an implication here that websites just find their way onto the Internet Archive, when in fact there are automation processeces and groups like Archive Team who facilitate this. Thus we find ourselves in a Catch 22, where if something has landed on the Internet Archive it is deemed legitimate, but if it is stuck in transit it was stolen unfairly.

This is a paradox that underpins the challenge of being an archivist today: sucess means being invisible and that your archives are never widely distributed. Does that perhaps sound familiar? It’s the exact same situation the XeNTaX community finds itself in. They would rather preserve their tools and assets internally, circulating on a need-to-know basis than have it out in the open. This ensures that the community retains its knowledge, but also controls it. It’s self-determination against potential exploitation.


The XeNTaX situation is not over and hopefully it will never be over in the near future. The XeNTaX forums might be gone, but XeNTaX lives on. And I believe it sets a good example: Archivism as a hobby or profession is something which should prevail within every community, instead of the interventionist culture from 3rd parties that we have grown accustomed to today.

But that reversal we have is warranted. Many times communities do vanish or are made to vanish, whether it’s subtitlers on YouTube or artists who can no longer use Macromedia Flash. Often times, these communities do not have an obvious way of preserving their memories; the decision is out of their control and attempts at preservation necessitate challenging authority, ad hoc solutions and technical expertise (often from outside).

Whether you define yourself an archivist, a hoarder, a pirate, a cracker, an archaelogist or whatever; it is a must that you understand where the files come from. You don’t have to obey all of the wishes of the original creator, but you have to respect them. Especially if they’re still alive and kicking. The costs couldn’t kill XeNTaX, but from the looks of it archivists almost did.

]]>
https://datahorde.org/without-being-exploited-what-archivists-should-learn-from-the-xentax-forums-aftermath/feed/ 2 2923
Pulling Rank: The Legacy of Alexa Internet https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/ https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/#comments Fri, 29 Apr 2022 17:25:26 +0000 https://datahorde.org/?p=2772 Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.


Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

Alexa Internet Logo (1997)

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.

Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.
Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.


As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.

]]>
https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/feed/ 1 2772
YouTube was made for Reuploads https://datahorde.org/youtube-was-made-for-reuploads/ https://datahorde.org/youtube-was-made-for-reuploads/#comments Wed, 28 Jul 2021 08:57:00 +0000 https://datahorde.org/?p=2548 The term reupload refers to a new upload of a file previously shared on the web, with minor alterations. Though somewhat a stigmatized term nowadays, reuploads can bridge the past and present, if and when the original version of something becomes unavailable.

YouTube is a platform and a community which live off of reuploads. One might even go so far as to say that reuploads have been a key to YouTube’s success and reuploads themselves have been a product of YouTube. With recent events in mind, now is a good time as ever to re-examine the mutually beneficial relationship between YouTube and the practice of reuploading.


In 2005, YouTube started off as a small video-sharing site. At the time few people would have been able to predict that it would grow to be the 2nd most popular site on the web and yet here we are! One factor Co-founder Jawed Karim attributes their success to is timeliness. In particular, he thinks YouTube came at a time when clip sharing became very common. To quote from a talk he gave in 2006:

The “clip culture” you see now is basically this demand that you can find any video at any time and you can share it with other people, or you can share your own videos with other people. […]

There were a couple of events in 2004 that kind of fueled this. One was this [wardrobe malfunction]. So this, of course, happened on television, but it only happened once and never again. And so for anyone who wanted to see it after that, well they had to find it online. The other big event I remember is this [Stewart on Crossfire] interview. And you know this was also shown on television once and not after that. Everyone was talking about it, but people who missed it really wanted to be in on the joke so they would try to find it online…

Jawed Karim, r | p 2006: YouTube: From Concept to Hypergrowth (25:15)

YouTube was able to meet this clip demand, acting as a universal replay button for any clip people could imagine. It’s no coincidence that obscure/lost media fanatics were flocking to the site not soon after its launch. From Sesame Street shorts to TV pilots, old footage quickly piled up! YouTube had an entire subculture of video remixes called YouTube Poops, which were made from recycled clips from old TV shows and games.

Alas this clip culture was both the boon and bane of early YouTube. As users uploaded these clips liberally, some of the owners and rights holders of the original source material of said clips came to view this practice as copyright infringement. This tension led to the infamous
Viacom vs. YouTube case in 2007, where media giant Viacom sued YouTube and Google for $1 billion in alleged damages! If you are looking for a good summary, EmpLemon did a video on it a few years ago.

Viacom did not actually win the case, in fact it came to light that they had taken advantage of clip culture for a stealth marketing campaign of their own. But the whole ordeal had lasting effects on YouTube. In an attempt to appease intellectual property owners, YouTube introduced their content ID system, then called video ID, for automatically detecting copyright infringing videos.

(Video Identification ~ YouTube Advertisers. If the above video is unavailable please use this Wayback Snapshot)

All of a sudden, videos on YouTube became a whole lot more volatile. This automated system did not only take down a lot of infringing material, but it also hit false positives, matching short-length clips, remixes and video reviews as well. At one point you would have been lucky to have had a few of your videos deleted, as opposed to having your whole channel terminated for seemingly having one too many copyright strikes. Yet clip culture on YouTube has somehow been able to endure, even beyond this era.


You might be wondering how frequently videos on YouTube are being deleted. To put things into perspective, Archive Team ran a video survey between 2009-2010 to collect metadata on over 105 million public YouTube videos. By August 2010, 4 million items in this collection had been deleted, or 4.4%. This year, in 2021, a fellow Data Horde member investigated how many of the videos in this collection were still available. They estimated from a subset* in the 2009-2010 collection, an astounding 52% had been deleted, 4% were made private, and about 44% remain viewable on the platform!

* the estimate was performed by crawling 50239844 videos from the dataset over the last 3 years.

The term reupload probably first entered the YouTube lexicon when users began uploading new, higher quality versions of videos on their channel as YouTube kept introducing higher caps to video quality. These YouTube upgrades came around the same time as Content ID, so you will find cases where the reupload of a video has survived but the original has been deleted.

It wasn’t just the video makers themselves who were reuploading though, soon other users also began reuploading downloaded copies they had made of their favorite YouTube videos. This was not merely due to fans appreciating content from their fellow YouTubers, but also due to the fact that the frequent channel terminations could deny the original uploader the right to reupload their channel’s videos in the first place.

YTPMV Remix: Planet Freedom, original by Igiulamam, reuploaded by oiramapap

Ironically, the term reupload soon was associated with degredation in quality as people began reuploading videos over and over again. There’s even a Gizmodo article about it from 2010. There have also been people who have complained about their work being reuploaded without permission or credit, worse yet plagiarised. Clearly, reuploads are a great power that came with great responsibility. Still, many diligent channels are dedicated to preserving the memory of original content through its reuploads.

The fear of such memories being lost through mass-deletions looms over YouTube, even today. Early ContentID was certainly not the last disaster to plague YouTube videos. Hacker pranks, copyright trolls, the Adpocalypse and Elsagate controversies have all taken their toll on many unfortunate channels. Today, we once more find ourselves on the brink of a scene similar to a mass-deletion, with the mass-privating of unlisted videos uploaded prior to 2017.


A few years ago it was discovered that YouTube video IDs were being generated according to a certain pattern and it was thus theoretically possible to predict video links. This presented a problem for unlisted videos, which were meant to be videos that were to be shared by link only.

Unlisted videos are a tricky subject; on one hand, a video might be unlisted, rather than privated, to make it easier to share with friends. On the other hand, many YouTubers also unlist videos such as outtakes, early revisions of videos, stream archives or off-topic content that might not fit their channel’s niche. Such videos are linked to, in video descriptions, pinned comments or Tweets. So while some unlisted videos aren’t meant for everyone’s eyes, other unlisted videos are only hidden from the channel interface and search results. Yet an exploit is an exploit, and URL predictability could be a serious problem for certain videos.

Some action certainly had to be taken here, so in 2017 the video ID formula was changed into something less predictable, that was definitely a step in the right direction. What is happening today, 4 and a half years after, is a security update to set a sizable number of unlisted videos uploaded prior to that date to private. Thus, several million videos have suddenly been virtually deleted, as they are no longer accessible to anyone but the channel owner. While this decision will secure potentially private content for many channels, it is also a great loss for inactive channels who unlisted videos liberally and were not able to opt-out of the decision.

Our Unlisted Video Countdown on Twitter

On the bright side, channels which are still active can set their videos to public at a later date. In fact, YouTube goes so far as to encourage these channels to re-upload their own videos to be able to take advantage of the new URL system. Except, it’s just not the original uploaders and video makers who are reuploading. Reuploads from other users who had previously downloaded unlisted videos are starting to also pop up, the same as it ever was.

YTPMV Remix: 00000000.restored.wmv.
Original upload by HOZKINS, reuploaded by IAMGOOMBA, re-reuploaded by aydenrw.

With tools like youtube-dl or Reddit’s SaveVideo, the YouTube community is pulling together to salvage whatever they can from old unlisted videos. And they are getting only better, Archive Team’s unlisted video project hit over 200TB of data. As videos die off, here are some folks desperately trying to revive them, trying to uphold what one might call their online heritage.

A few days ago one of the oldest videos on YouTube was made public from unlisted. It was originally uploaded on April 29, 2005. Titled Premature Baldness, it too is a reupload and final memento from a chasebrown.com which is no longer recognizable. A whisper to remind us that while invoking the right to be forgotten we ought not to neglect, on the other hand, a right to be remembered…

]]>
https://datahorde.org/youtube-was-made-for-reuploads/feed/ 1 2548
Why We Shouldn’t Worry About YouTube’s Inactive Accounts Policy https://datahorde.org/why-we-shouldnt-worry-about-youtubes-inactive-accounts-policy/ https://datahorde.org/why-we-shouldnt-worry-about-youtubes-inactive-accounts-policy/#comments Wed, 14 Jul 2021 02:31:52 +0000 https://datahorde.org/?p=2481 From time to time, YouTube users and archivists worry that, because of YouTube’s Inactive Accounts Policy, YouTube channels will be deleted if they are left inactive for more than six months. The policy reads:

Inactive accounts policy
In general, users are expected to be active members within the YouTube community. If an account is found to be overly inactive, the account may be reclaimed by YouTube without notice. Inactivity may be considered as:
- Not logging into the site for at least six months
- Never having uploaded video content
- Not actively partaking in watching or commenting on videos or channels

This policy is not new. Much of the text of this policy actually dates back to at least June 17, 2009, when the policy was originally introduced as part of YouTube’s username policy for username squatting. At the time, the policy was designed to prevent inactive users from holding valuable usernames or usernames that match brand names. This is because, from YouTube’s launch in 2005 until March 2012, every YouTube channel had to choose a unique username that would form its permanent /user/ URL. Additionally, from 2012 until November 2014, all channels could optionally sign up with or create a permanent username without having to meet any eligibility requirements. Because usernames were in such high demand, the original policy stated that the usernames of reclaimed accounts may be “made available for registration by another party” and that “YouTube may release usernames in cases of a valid trademark complaint”, though the former passage was removed by October 9, 2010.

Since November 24, 2014, YouTube’s username system has been replaced by a custom URL system with minimum eligibility requirements that are more difficult to meet using inactive accounts or accounts created just for squatting usernames. As of July 2021, accounts need to be at least 30 days old, have at least 100 subscribers, and have uploaded a custom profile picture and banner in order to claim a custom URL. Additionally, with the new system, YouTube is able to “change, reclaim, or remove” these custom URLs without otherwise affecting the associated channel. As such, the Username Squatting Policy was no longer necessary for its original purpose.

At some time between February 2013 and March 2014, the Username Squatting Policy was renamed to the “Inactive accounts policy” and the sentence about releasing trademarked usernames was removed. As of July 2021, the policy has not been revised since then. It also appears that the policy has fallen into disuse: in March 2021, a Reddit user posted “As a trusted flagger I can tell you that YouTube hasn’t used that policy in years.”

Additionally, at some point between September 2014 and March 2015, YouTube created a new support article which stated that “Once a username was taken by a channel it could never be used again, even if the original channel was inactive or deleted”, which directly contradicts the purpose of the original Username Squatting Policy.

Some archivists fear that the large amounts of video data being stored from inactive accounts may be lost if YouTube decides to delete those accounts. However, it appears that YouTube has found a way to help offset some of the cost of storing these videos. On November 18, 2020, YouTube announced that they would enable advertisements on videos posted by channels that are not members of the YouTube Partner Program. While no explanation was given for this change, it was announced during the same 3 months in which Google announced several major changes that would reduce the amount of storage being used across the company’s products [1] [2], so it can be inferred that this policy change was made for the same reason.

So, why does the policy still exist? One possible reason is that the policy is simply forgotten. YouTube’s support site is large and contains many articles, and many of them have outdated passages and describe discontinued features that were removed long ago [1] [2] [3] [4]. Many pages also contain references to the old version of YouTube, which has been inaccessible to the public since December 2020 [1] [2] [3]. Also, as of 2021, the text of the Inactive Accounts Policy hasn’t been updated for at least 7 years, though the surrounding page was updated in September 2020 to remove the policy on vulgar language, which had been given its own page. So, YouTube could have simply forgotten that the Inactive Accounts Policy exists, and the people responsible for updating the support pages could have just left the policy because they weren’t specifically instructed to remove it.

Another possible reason the policy still exists is that, while unlikely, YouTube could be preserving the policy for possible use in the future. However, YouTube would provide advance warning to users, likely via email and updated support articles, before enacting this policy, and since we have seen none of those shared online, we have no reason to believe this policy is being enacted at the current time.

So, while YouTube has an Inactive Accounts Policy, it hasn’t used it in years because URLs on the service can now be changed and removed without deleting and recreating accounts, and it appears it has found a way to help offset some of the cost of storing the videos uploaded by these channels. At this time, users and archivists shouldn’t worry about this policy, but should instead focus on specific content removal announcements such as annotations, liked videos lists, draft community contribution closed captions and metadata, playlist notes as well as older unlisted videos.

]]>
https://datahorde.org/why-we-shouldnt-worry-about-youtubes-inactive-accounts-policy/feed/ 2 2481
A New Breed of Digital Archiving and Preservation https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/ https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/#respond Thu, 05 Nov 2020 23:05:42 +0000 https://datahorde.org/?p=1711 There it is, because someone thought it ought to be out there. Perhaps a story you read, a picture you saw or even a game you played… It was there because someone poured their heart and soul into making it, and it mattered.

Alas we find ourselves in an age where everyone collectively suffers from short-term memory loss. All that is ever on our minds is what’s relevant -here and now- and everything that is irrelevant is as good as imaginary.

The digital archivist or preservationist’s job is to, ultimately, save those things that matter. To that end they go to great extents; downloading terabytes of data, reverse engineering decade-old websites or even hunting down the source code for the most obscure software.

But it is not an easy job and it certainly is not getting any easier. Things are disappearing at too fast a rate for even the most attentive archivists to be able to keep up. Today, the digital archivist is fighting against currents they cannot overcome, with outdated wisdom. As it were, the archivist ought to find a way to swim with these currents, by taking advantage of tools and options that would have previously been unavailable to them.

In honor of Digital Preservation Day, I myself, as a digital archivist, would like to offer my own two cents for the next generation, that is to say a new breed of digital archivists. The three As: Adaptability, Acceptance and Acknowledgment.


Adaptability:

Keeping track of what is being retired, what websites are dying and mobilizing as quickly as possible!
flat lay shot of tools
Photo by Miguel Á. Padriñán on Pexels.com

This past September deserves to go down in history as Shutdown September, seeing how many websites were shut down that month. And honestly, archivists were barely able to keep up.

  • Archive Team, only got to work on archiving the massive Chinese social media site Tencent Weibo about 10 days after the shutdown announcement. While 248 TBs is an impressive feat, it’s only a fraction of the web content on Tencent Weibo. A lot more could have been grabbed if action was taken sooner.
  • A similar case was the shutdown of Naver Matome, a kind of Japanese tumbleblog. Despite the early shutdown announcement in July, Archive Team’s archival project only began months later in September, with only about a week to spare.
  • YouTube recently removed their community contributions feature and Data Horde started a project to save unpublished drafts. Although the feature removal was known for 2 months, it took us weeks to notice that drafts were at risk and even longer to note that YouTube had restricted the feature last year, leading to many drafts never being published. While we were lucky in that the drafts remained accessible for a month beyond the expected deadline, we might not have been as lucky.

Clearly, there is a need for a watchdog, or two, or three, to be able to inform preservation groups of websites which are closing down, or features being retired before the last minute. While tech news sites like the Verge might occasionally report on shutdowns, these are generally restricted to English websites. As for archivists, there are mainly three outlets:

  1. Archive Team’s Watchlist page.
  2. The Internet Archive Blog, if they are involved.
  3. And us, Data Horde, whenever we find out about a shutdown.

Other than that, it’s a matter of luck if a shutdown announcement makes it to the top page on Hacker News or Reddit. This is unreliable and we need to do something about it.

For starters, we need to not only monitor individual websites, but massive platforms like YouTube, Twitter and Reddit. We need our own unofficial open-documentations to note planned update changes when they are hinted at in tweets or in blogs, prior to official announcements, to be able to mobilize dynamically.

As for websites, especially non-English ones, we need to make it easier for non-archivists who are concerned to be able to reach out to us. Which brings us to the second A: Acceptance…


Acceptance:

Come as you are, we do not just need programmers and librarians, anyone can contribute to preservation and everyone should!
multicolored umbrella
Photo by Sharon McCutcheon on Pexels.com

The archiving community is, by and large, in favor of collaboration and open source. But currently this only applies among archivists and preservationists. Even with the source code and tools out in the open, the average person will most likely lack the technical knowledge to understand what the hey they’re looking at.

As difficult as it might be to admit, digital archiving is not very well known. Even the words preservation or conservation evoke ancient manuscripts, or endangered animals. Digital preservation is far from the first thing to come to mind and this degree of obscurity is not something to be desired.

Even if this obscurity comes with a sense of pride from the joy of being of a select few who know this craft, it has also come back to bite us. There is a good reason that a lot of people who discovered the Internet Archive, through their National Emergency Library experiment earlier this year, they weren’t the biggest of fans. Naturally, people were more inclined to trust the verdict of their favorite authors or speculate rather than to go and read what the NEL actually was and how it was justified.

If we, as preservationists, don’t promote our own work, why should other people? For all they know, we are all just rogue people with malicious intent. Then are we to grow old reclusively in our obscure hobby? Seeing how few of us archivists there are out there, I find it sad how many of us have made a name for ourselves as “grumpy old men”. We don’t have to be vagabonds, and we shouldn’t be. Because if we choose to alienate ourselves from the rest of the world, amateurs will develop their own archiving techniques to take our place.

  • The Save Yahoo Groups/Yahoo Geddon project was initially led by fandom. While Yahoo Groups might not mean a lot in 2020, many fan groups trace their origins to old mailing lists, some of which Yahoo had later acquired. When it was announced that all public groups would be privated, those people knew what was at stake; the history, the works, the memories of two decades.
    So they blindly charged in. None of them were proper digital preservationists, even if some members might have had an affinity for it. But they organized and developed their own method of hunting down and tracking downloads for ancient groups that the world had forgot.
  • Another similar project is BlueMaxima’s Flashpoint, a massive effort to preserve Flash and other multimedia web-content which will (or might) break in the future due to incompatibility. Initial volunteers to the project had some programming knowledge and were motivated to preserve games from their childhood. But they discovered that if they laid down a clear path, other people in a similar situation would be more than happy to contribute.
    They developed their own tools for downloading and curating games into their Flashpoint collection so volunteers would not have to start from scratch. And they did not shy away from letting their project be publicized.

Both of these projects have come a long way, in large part due to the sheer enthusiasm to initiate these projects and later due to extensive help from seasoned archivists who took note of these ideas and supported and nurtured them.

The bottom line is, we need to not shew people away, but embrace them. If an apprenticeship system seems too degrading at the very least we, as archivists, should take note from Save Yahoo Groups and Flashpoint when it comes to writing our tutorials and publicizing our projects.

We shouldn’t merely lurk on IRC chatrooms, we need to be able to reach the same people we’re trying to help, even if it’s on social media. DEF CON is cool, but wouldn’t it be nice if we could stand on our own two legs and had a convention of our own?

And when people come to us, asking for help, with no credentials whatsoever, we need to learn how to help. Which brings us to the third A: Acknowledgment.


Acknowledgment:

Understanding our circumstances and constraints, recognizing that we are not all on equal footing and that we might have very different goals and respecting one another.
photo of people near wooden table
Photo by fauxels on Pexels.com

It is entirely possible for a seasoned archivist to encounter someone -who is much less knowledgeable in archiving, or even programming- asking for help. And as someone who dedicates their time, to observe different archiving communities, I can acknowledge that every group has their own focus policy.

  • Mature archiving groups generally have a lot of tutorials and hardware they are more than happy to share, for their particular focus. Here, let’s acknowledge that every group has their own particular focus, information on a lost commercial is likelier to show up on the Lost Media Wiki, news on a delisted game is more Dead Game News territory.
  • Semimature archiving groups generally function as cliques, they have the expertise but are constrained by resources. Anyone expecting help from higher-ups within the group should first make a name for themselves, to prove that their ideas are worth the high council‘s time.
  • Some other groups are one-man armies, they find a very specific niche and document whatever they can. They might look to expand, or insist their archive is only for the fun of it.

So mature and small groups are comparatively stable, whereas semimature groups are considerably volatile. This is not really something we can change, so instead we should learn to acknowledge it. Some groups are always itching to lend a hand but other groups really won’t be willing to give outsiders, who have not proven their value and sincerity, the time of day.

Next we have to acknowledge expenses. Preservation is expensive, digital or otherwise. For some communities digitizing physical material is not easy, as the incomplete digitizations prior to the unfortunate fire at the National Museum of Brazil goes to show. It is a harsh reality that we do not receive the necessary subsidies from states, often relying on donations.

Again, mature groups are self-sustaining. For every member who retires, there will be new volunteers joining in. They also often have some form of donation system to maintain and expand their hardware. As for archiving groups which are still developing, they will have their ups and downs. A single Tweet or Reddit post can bring in tens of new people overnight. Conversely, it’s not uncommon for splinter groups to emerge if their focus diverges from the majority. Money and hardware might be supplied through donations, but again it could always be cut off. Most notably, The Eye has been experiencing a lot of downtime and struggling to cover costs these past few months. Of course, if you can’t cover the costs in the first place you’re better off avoiding anything ambitious.

As much as we archivists advocate for open access, it’s a harsh reality that we are more often not able to actually provide this. People outside of archiving circles have already discovered how to turn a profit out of archived material, through ad-revenue or paywalls. Take getdaytrends.com as an example, where you can look up Twitter trends per day, while they cover their costs through ads. Then perhaps, some of us smaller archiving groups could survive through such practices, until they can achieve a more mature stage.

But as a maxim: Access should be opened chronologically, so that nothing which has been opened before is closed off. (Ex. uploads for the last year are behind a paywall, but on 1 January 2021, uploads on 1 January 2020 go open access)

Let’s talk a bit about a community which actually is trying something similar to this, namely OldGameMags. OldGameMags was only a small group of people who’d met each other over the internet and had been pooling resources to gather old game magazines. And they were barely receiving any donations. So they came up with a clever idea: To gain access you either had to make a one time donation, or actually send in a magazine to be scanned into their collection.

Except not everyone was happy with this. Back in June, OldGameMags and the Internet Archive were at a stand-off. OldGameMags proprietor Kiwi, discovered that some of their magazine scans were being uploaded to the Internet Archive and, surprisingly, this was a repeated offense. While the Internet Archive is a massive, well-funded and well-known archiving platform, OldGameMags is a small group who gathered everything they had with blood, sweat and tears. From a utilitarian point of view, the IA volunteer who was mirroring OGM might have found a home which could host these with a much lower burden and make them accessible for a cheaper cost. But on the other hand, if they continued mirroring everything on OGM, there would no longer be any reason for people to join OGM. This was their passion and from OGM’s point of view, it was as good as stealing (even if a donation had been made). Eventually the two sides were able to reach a settlement, but if IA and OGM had been able to better communicate their intention, such a conflict of interest might have been resolved a lot sooner.

Then finally, as it stands, we must acknowledge one another. We’re all in this preservation business together after all, even if our goals might differ. To reiterate, most of us don’t have states, companies or foundations subsidizing our efforts. So we ought to support one another, technically and financially.

This is why we need arbiters such as the DPC, SPN and IIPC to bring us together. We need to do what we can preserve the preservationists, because nobody else will! The best way to do that is to acknowledge one another, working synergistically and doing our best to not offend one another.


shallow focus photography of hourglass
Photo by Jordan Benton on Pexels.com

Preservation means preserving the past, yet we do it not for the past, but for the future. Then what right do we have to object so vehemently to get with the times? The world is changing, it’s high-time we caught up.

We need adaptability to achieve dynamism, to keep up with the rapid decay of digital information. We need acceptance to foster new ideas and train new archivists. And we need acknowledgement to protect and support one another.

So as we enter a new decade, may these words usher in a new era. A new era for a new breed of digital archiving and preservation…

]]>
https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/feed/ 0 1711