history – Data Horde https://datahorde.org Join the Horde! Tue, 20 Feb 2024 21:30:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png history – Data Horde https://datahorde.org 32 32 Community Spotlight: Pikmin Archives https://datahorde.org/community-spotlight-pikmin-archives/ https://datahorde.org/community-spotlight-pikmin-archives/#respond Tue, 20 Feb 2024 21:30:09 +0000 https://datahorde.org/?p=2937 Who are they?

Pikmin Archives is a group dedicated to collecting developer notes and promossional material on the Pikmin series of games.

What do they do?

The Pikmin series are RTS games where players must guide a swarm of aliens to thrive in the wild! The Pikmin games are very memorable for their unique artstyle contrasting everyday objects over sci-fi technology and fantastical nature. Celebrating that artstyle, Pikmin Archives is focussed on documenting the creative process behind the Pikmin games.

For example, Pikmin Archive member Flamsey restored the old Pikmin 2 USA website which had ceased to function due the discontinuation of Flash Player.

How do they do it?

Pikmin Archives is most active on their Discord server which is a hub for exchanging files and fostering discussion. There, a dedicated #archive-submissions channel is used to submit media and submissions are then curated by the mod team.

Occasionally, members might post their findings to Twitter; but there is no dedicated Pikmin Archives social media account or website at this time.

How do I sign up?

Just hop on board their Discord Server!

So what are you waiting for? Become a Pikmin Archivist, today!


Looking to discover other archiving communities? Just follow Data Horde’s Twitter List and check out our other Community Spotlights.

]]>
https://datahorde.org/community-spotlight-pikmin-archives/feed/ 0
Twitter in Trouble? Why you should Archive your Tweets https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/ https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/#comments Mon, 05 Dec 2022 17:04:49 +0000 https://datahorde.org/?p=2852 Twitter has seen some radical restructuring since Elon Musk’s acquisition over a month ago. Now is a good time as ever, that we talked about what options you have in archiving or preserving your Twitter content.


This new era of Twitter has been quite turbulent, to say the least. More than half of the workforce has been fired or has quit, and site functionality is becoming unstable, as reported by the Seattle Times. Mastodon has emerged as a serious Twitter alternative. In fact, some of those who have departed Twitter now have their own Mastodon instance over at macaw.social. Personally, I am excited about the rise of mastodon as an alternative as I have been posting Data Horde updates over at @[email protected] for about two years now.

So, why not leave Twitter behind and move on? Now, Twitter allows you to request a copy of your personal data: Tweets and all. But it’s probably hard to leave a site that you have been on for over a decade. Especially, when requesting your personal archive is not even working correctly. Many people have reported that archive requests are being ignored or processed with delay. On a test account, we at Data Horde found that it took over 3 days to receive a personal archive.

Tweeters complaining about being unable to export personal archives: view snapshot at archive.is

In 2022 this is a big deal, not only for archivists but also for legality. Article 13 of the GDPR mandates a responsibility to provide a copy of collected data to users (i.e. data subjects) upon request. Outside of Europe, California’s CCPA has a similar clause protecting the right to know.

There are repercussion for not respecting these rules. Recently another messaging app, Discord, was fined 800 000 Euros for failing to respect data retention periods and security of personal data by French Regulator CNIL. That was actually a reduced fine, given Discord’s conciliatory attitude. If Twitter does not up their game, they may meet a similar fate, if not a worser one.

Now that I have your attention, I would like to direct it to the help page on how to request a personal archive from Twitter: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive . Even if a bit unstable, this is what you need to follow to save a copy of your Tweets.

The Twitter archive is big and burly but not perfect. Johan van der Knijff recently wrote a blogpost on some shortcomings, such as the t.co URL-shortener and some workarounds: https://www.bitsgalore.org/2022/11/20/how-to-preserve-your-personal-twitter-archive


Oh, and by the way. It gets worse: Elon Musk has also stated interest in purging inactive accounts and their Tweet history.

Archive Snapshot: https://archive.ph/hcKsV

This might not seem like a big deal, except to the one or two of our readers who periodically scrape politician accounts off of https://ballotpedia.org. Yet it is actually a serious turning point. Currently, Twitter does not purge inactive accounts, except in the event of death or incapacitation and by special request.

In 2019 there was an attempted Twitter policy change to expire accounts which had not been logged into for 6 months. This sparked outrage across the platform by those who saw this as unfair to the memory of inactive accounts. In particular, fans of deceased K-Pop artist Kim Jong-hyun, otherwise known as Jonghyun (김종현/종현) came to the defence of his legacy overturning the attempt altogether. Turning back on this decision would go against all of that heritage, people’s heritage, Twitter’s heritage, web heritage. Alas this the projected course of things, even if we cannot prevent it, it is perhaps our duty to protest why it is wrong.


What about the extreme scenario of a total collapse of Twitter? What does that mean for web history? Well, the good new is that people have been thinking on this for much longer than before this year.

Already in 2010 the Library of Congress announced that they would be copying the entire internal archive of Twitter, starting from March 2006.

Archive Snapshot: https://web.archive.org/web/20161208074132/https://twitter.com/librarycongress/statuses/12169442690

There are also many smaller grabs on the Internet Archive and archive.today, some of which you have seen linked above. Special mention goes to Archive Team‘s periodical Twitter Stream archive.

Last but not least, you can help! The Internet Archive is collecting Tweet dumps from people as we speak: https://archive.org/services/wayback-gsheets/archive-your-tweets Whether you just want extra insurance for your back-up, or to contribute to the wealth of the web you can help by using the above tool to upload your Tweets to the Internet Archive for generations to come.

]]>
https://datahorde.org/twitter-in-trouble-why-you-should-archive-your-tweets/feed/ 1
Archive95: The Old Man Internet https://datahorde.org/archive95-the-old-man-internet/ https://datahorde.org/archive95-the-old-man-internet/#respond Thu, 21 Jul 2022 23:28:13 +0000 https://datahorde.org/?p=2811 The internet is kind of old. To be fair, so is the Internet Archive and its Wayback Machine. But IA isn’t older than the internet (how could it be?) so there are some things that could slip through the cracks. Things before its founding in 1996, for example.

Then comes along Archive95 which is an archive of the pre-IA internet of 1995. It primarily uses two sources, the World Wide Web Directory and the German language Einblick ins Internet, to give an impression of an era when the web was small and monitors were bulky as heck.

– glmdgrielson, a young whippersnapper

]]>
https://datahorde.org/archive95-the-old-man-internet/feed/ 0
Pulling Rank: The Legacy of Alexa Internet https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/ https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/#comments Fri, 29 Apr 2022 17:25:26 +0000 https://datahorde.org/?p=2772 Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.


Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

Alexa Internet Logo (1997)

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.

Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.
Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.


As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.

]]>
https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/feed/ 1
Interview with Hubz of Gaming Alexandria https://datahorde.org/interview-with-hubz-of-gaming-alexandria/ https://datahorde.org/interview-with-hubz-of-gaming-alexandria/#respond Mon, 18 Apr 2022 09:09:30 +0000 https://datahorde.org/?p=2719 Hello, here’s another interview, this time with our head overlord Hubz of Gaming Alexandria.

glmdgrielson: So, first question, what is Gaming Alexandria?
Hubz: At it’s core it’s both a Discord community and a separate website dedicated to preserving various aspects of video games, such as scans, interviews, unreleased games, youtube videos etc. It mainly started as a site where I could share high quality scans but has grown thanks to many people joining up with various skills to help expand the website. The Discord community itself is really an entity unto itself at this point where lots of gaming historians/preservationists have come together to share their works and also help each other out when needed with various projects. I love getting to see all the passion in everybody’s projects that they put forth and the willingness of the community to offer help when asked.

g: Tell me more about this community. I’m active in the server, but what does it look like from your end?
H: From an admin standpoint I have access to all the channels which include the private #staff and #mods channels where we discuss upcoming articles or projects for the site as well as handling the occasional argument or bad apple in the chat. Dylan Mansfeld (DillyDylan) handles a lot of great articles on undumped/prototype games that were previously unreleased. Ethan Johnson writes for his own blog (https://thehistoryofhowweplay.wordpress.com/) and Gaming Alexandria at times and is our editor so he glances through and cleans up all the articles that get posted. Jonas Rosland who is the Executive Director of the NPO, I’m a board member of, called Hit Save (https://hitsave.org/) does a lot of thankless technical work behind the scenes that includes a NAS he has setup for not only the staff of the website to store project files but the community at large which is a huge help. Wietse van Bruggen (Densy) handles a lot of the moderation of the chat and has been a huge help keeping the Discord community friendly and clean with his balanced moderation style. Last but not least there is Stefan Gancer (Gazimaluke) who did the original site redesign and has been a great idea man for ways to improve the site and community as time has gone on. For me personally I try to keep up with all the chat in the channels (though it can be tough at times!) just to have an idea of what’s going on and seeing what I can help with or connect people to further projects as well as post my scans and projects as they’re completed. Thanks to the rest of the staff I rarely have to step in and moderate which is very nice!

g: I’m going to skip over the omission of Norm and ask about the history of how the site has evolved.
H: LOL yes Norm is a menace to society and must be stopped.

Editor’s note: Hubz has a mock rivalry with Norm, a.k.a. the Gaming Historian and is a frequent running gag on the server. I do not believe there is actual malice.

The website itself started officially on October 23rd, 2015 and was just a basic text website that I could easily upload to in order to share my scans, it was very barebones. The reason I wanted to get high quality scans out was due to using an emulator frontend called Hyperspin. For popular systems it had a lot of decent quality artwork for boxes. But for lesser known systems it was sorely lacking and that triggered my OCD and made be realize that scanning stuff in high resolution was something that needed to be done. Slowly, but surely, I met others that wanted to scan in high quality and have their stuff hosted and they would submit stuff such as Densy. At some point I got involved with the VGPC discord and met Kirkland who had been quietly doing something similar with his collection and collaborated with him and others on establishing scanning standards to use going forward to have some level of consistent quality with those that were willing to do it which eventually led to what is the https://scanning.guide/. In late 2018 the site was graciously redone by Gazimaluke and relaunched in the design you see now. We started branching out into actual articles written by our staff and releasing prototypes and unreleased games that we came across. The site continues doing this to this day, though we are branching out into more guest authors from the community posting interviews and articles as well in the near future.

g: As well as hosting my site, for which I am grateful for. So, what is the day to day like for you?
H: Day to day on the scanning I try to get at least one magazine done daily. Doesn’t always happen but, in general, I debind a magazine the night before, then in the morning scan it in before leaving for work. If work gets slow I work on processing the scans, or else I’ll do it later that night and get them uploaded to the site and the Internet Archive.

g: Interesting. So how big do you think your archive is by this point?
H: Archive upload-wise I’m probably right around 2900 items if you count stuff that was removed lol. Then there’s a bunch on the site that wasn’t done to the higher scanning standards I go by now that’s not on the archive. So I’d guess in the 3000-4000 item range currently.

g: Do you know how big it is in terms of filesize?
H: Let me see real quick…
Looks like 2.5TB which is another reason I’m so thankful to have the Internet Archive to host my scans on due to the space and bandwidth that would be required otherwise.
The site alone usually has about half a TB of traffic per month so I can only imagine what it would be like if the magazine scans were also hosted directly on it.

g: Neat. Is there anything interesting that you got to be a part of due to GA that you would like to share?
H: Biggest thing is probably working with The Video Game History Foundation on scanning their extensive magazine collection so digital copies can be provided along with physical copies at their library. Being able to leverage the Internet Archive so people all over the world can easily access the magazines I’ve scanned that they might not have been able to easily otherwise is a great feeling personally for me. So many of these things are quite difficult to acquire and expensive as time goes on so having them as an ally in the preservation world is a godsend. There’s been lots of other connections and other projects I’ve worked on as well but I won’t ramble forever on that. Not only is Gaming Alexandria a tight community that likes to help each other out but there’s plenty of other preservation groups like VGHF, TCRF, and Hidden Palace just to name a few and we all get along great and try to push preservation forward together.
There’s so much work that needs to be done that we need all the help we can get and we need to support each other any way we can I think.

g: True that. Last question for now: anything that you would recommend to a would-be archivist?
H: I think it’s a good idea to preserve what interests you, which seems to go without saying, but I mean it more from a sense of not only going after what is popular. While you might not get much fanfare initially for the more obscure stuff it’s likely you’ll be the only one doing it and it’s important it’s being done. If you do good work for long enough it will get noticed, and to make good work easier it’s best to go with what you’re passionate about. The other thing I would suggest is not beating yourself up or comparing your output to others. Do what you can when you want to, this is a hobby after all. If you make yourself miserable trying to do something your output will naturally suffer or you might even burn out and stop altogether. Like I said before, we need all the help we can get, so try to avoid that if at all possible.

g: Thank you for being here, overlord Hubz. It’s been good talking to you.
H: No problem! Thaks for the interview. 🙂

– glmdgrielson, being a very good minion interviewer

]]>
https://datahorde.org/interview-with-hubz-of-gaming-alexandria/feed/ 0
YouTube Attributions to be removed in September https://datahorde.org/youtube-attributions-to-be-removed-in-september/ https://datahorde.org/youtube-attributions-to-be-removed-in-september/#respond Sat, 28 Aug 2021 22:59:17 +0000 https://datahorde.org/?p=2599 On August 18, YouTube quietly announced that due to “low usage”, they will be removing video attribution pages. One version of the announcement said that this will happen in “early September” and another said “after September”. YouTube instead recommends using the description to attribute videos.

Video attribution pages were intended to list which videos were used to make the current video. This created a network of videos, connecting remixes/compilations/shorter versions of videos with their original source videos. These pages also helped ensure that credit was given to the original authors of video clips, even if the original uploader might have forgotten to do so.

Until some point between 2017 and 2019, video attribution pages also listed the videos that used the current video. The attributions were automatically associated with a video when someone used the online YouTube video editor to add a Creative Commons-licensed clip to their video. If a video had attributions, a link to its attributions page would automatically be placed below its description. On the mobile YouTube app, this link would open the attributions page in the user’s web browser, but more recently all of the attributions links in the mobile app would open the channel that claimed the “Attribution” custom URL.

The video attributions page is one of the oldest pages on YouTube, and is believed to be the last page on YouTube that still uses the old, pre-polymer layout. In fact, the HTML content of the attribution web pages (excluding headers, footers, and video thumbnail overlays) has not been modified since 2011!

No formal archival efforts have been initiated as of this time, but it is anticipated that one will start soon.

]]>
https://datahorde.org/youtube-attributions-to-be-removed-in-september/feed/ 0
Community Spotlight: Dead Game News https://datahorde.org/community-spotlight-dead-game-news/ https://datahorde.org/community-spotlight-dead-game-news/#respond Mon, 19 Oct 2020 17:12:35 +0000 https://datahorde.org/?p=1643 Who are they?

Dead Game News (DGN) is a group dedicated to reporting games which are no longer available to consumers, or which are under the risk of becoming unavailable. Games that are dying or dead, as it were.

What do they do?

DGN is rather unique among preservation communities, being geared towards recent or ongoing events. You can expect them to report offline games being delisted, or servers shutting down for multiplayer games.

Beyond reporting dying games they also work to spread awareness on issues in the games industry which hurt to lifespan of many games. These include addressing pitfalls in DRM (Digital Rights Management) tools and “games as a service” practices.

How do they do it?

They are most active on their Discord server which is a hub for exchanging news and fostering discussion.

Occasionally they might tweet notable dying games on their Twitter account. Rarely, you might see a DGN video on Accursed Farms, where DGN first originated.

How do I sign up?

Just hop on board their Discord Server! Or if you would like to just follow the most important headlines give @deadgamenews a follow.

So what are you waiting for? Become a Game Mortician, today!


Looking to discover other archiving communities? Just follow Data Horde’s Twitter List and check out our other Community Spotlights.

]]>
https://datahorde.org/community-spotlight-dead-game-news/feed/ 0
October Status Update on the Save Yahoo Groups! Project https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/ https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/#respond Thu, 15 Oct 2020 23:00:35 +0000 https://datahorde.org/?p=1631 Last November, Yahoo announced that they would be shutting down many key features on the ancient Yahoo Groups. There was a major project to rescue data, lead by Archive Team and fandoms who traced their origins to Yahoo Groups. In fact we had written all about it back in January:

The story did not end there however. So let’s talk about what has transpired since…


Despite us even reporting 30 January as the final deadline, Yahoo continued to accept Get My Data (GMD) requests for about a week. So active efforts ceased around that time. Now was the waiting game, as it took a few more weeks for some of those GMD requests to process.

By late February, most of the volunteers had disbanded or moved onto other projects. But there was still much to be done. For one thing, people had rushed so much to grab everything that they could, that a lot of these group files were a total mess, not made any better by how Yahoo’s GMD exports worked. So the remaining volunteers stuck around to label their massive collection.

Doranwen, one of the leads on the Yahoo-Geddon (aka Save Yahoo Groups) project, frequently documented their progress during this time.

A few numbers and random other bits of info:

~2 TB of fandom data saved (that I know of, for now)
~200,000 confirmed fandom groups saved in some fashion
~2,000 Sims groups saved* …

*The only reason I know the Sims number is because I was tracking those groups on Google spreadsheets in order to find all of them and get volunteers to join them. For other fandoms it’s impossible to give any sort of number at this point (although I know there was a ton of LOTR, HP, Buffy, and Westlife, lol). Yahoo’s categorization was terrible and a group name doesn’t always give good clues as to whether it’s fandom/non-fandom. Getting that sort of data will take a good deal of time and work.

Doranwen, The end of Yahoo Groups – a few thoughts & stats

Another issue was that the collection was not actually unified. Archive Team had also archived a bunch of data, so the Yahoo-Geddon team continued to label those batch by batch for a few more months.

It truly is endless!!

Yahoo-Geddon volunteer, 14 July 2020

Yet another reason the Yahoo-Geddon team was taking so long was because of how meticulous they were. They worked to not only curate this collection for the sake of archiving, not only to trace the history of fandom, but also to be able to provide a rich dataset that researchers might want to use in the future.

-[Stage] 4.5b: Remember that we got a bunch of groups from scrounging the links of other groups for new groups to join? Some of the commands used to process that data generated “groups” that never existed (with http: stuck at the end, apostrophes or commas in them, etc.). Also one stage of the spreadsheet work ended up with a certain number of groups getting a duplicate version added to the spreadsheet with _dupe after the name.

So for this stage I send the spreadsheets to my assistant who runs a script against them to find groups with punctuation in them or _dupe at the end. A very very tiny number of very old (grandfathered from who knows which list service) groups actually legitimately have periods in their names, but in most cases groups with periods never existed either.

This process is fairly quick for each letter but varies greatly in what has to be done, as sometimes group folders are affected (and some punctuation marks Yahoo simply ignored everything from that mark onwards and treated the letters before it as a group name).

Yahoo Groups metadata processing steps, stage 4.5b

Sadly, Yahoo!, blind as ever to Yahoo-Geddon’s efforts, have decided to permanently shut down Yahoo Groups. While Yahoo Groups only retained its bare-bone features, this will be putting an end to some decade-old mailing lists…

On a related note, an interesting discovery Yahoo-Geddon made is that Yahoo actually has not deleted archives, photos and files but only removed public access.

The files are still there, from what I can tell! They’ve just blocked us from getting to them.

The monthly reminder emails with attachments are still coming in – and the attachments come from files in the files sections. Clearly those were never removed.

Which means that Yahoo could have chosen to grant us access to all of that for a full year before closing Groups entirely, but did not.

via the Save Yahoo Groups Discord server

Just goes to show that curation is the one half of archiving/preservation… If you would like to learn more or even participate in Yahoo Group dissection, check out the Save Yahoo Groups discord server: https://discord.com/invite/DyCNddf

]]>
https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/feed/ 0
YouTube is hiding Attributions to Fan-Captioners and Translators who wanted to be credited https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/ https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/#respond Thu, 15 Oct 2020 09:00:46 +0000 https://datahorde.org/?p=1589 On YouTube you sometimes come across videos which have subtitles for a bunch of languages. Take for instance this Japanese music video with translations in 20 languages! Have you asked yourself where these come from, is the uploader a polyglot or something?

These translations were contributed by fans of the channel, and if you were to go into the video description a few days ago you would have seen authors listed for some –but not all– of the languages. However, if you check the description now, you will notice that YouTube is now hiding all of these caption authors.

And to make matters worse, there is a good reason why the original list did not have all translators listed. To be able to show up on the list, contributors would have to check a box titledCredit my contribution which was turned off by default. So that means that anyone who was showing up on the list had explicitly volunteered to appear non-anonymously.

This comes following YouTube’s depreciation of the community contributions feature. While YouTube has assured users that they will keep published translations online, it would seem that they do not wish translators to receive any credit for these captions beyond this date.

If you as a captioner or content creator have been adversely affected by the removal of community contributions, check out our YouTube Captioner’s Toolkit for alternatives and useful resources.

]]>
https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/feed/ 0
[Obsoleted] YouTube removed community translations, but there is a workaround! https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/ https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/#comments Mon, 12 Oct 2020 20:04:36 +0000 https://datahorde.org/?p=1555 Edit: On October 28 around 8 PM (GMT) the old caption editor was shut down for good, blocking off further contributions for good. For external alternatives to community captioning, check out our Captioner’s Toolkit page:

For the past few years YouTube had been supporting community captions, a feature which allowed users to submit captions or translations for videos of other channels. On September 28 the feature was removed and the menu to access it was hidden.

However, you might still spot new videos with community contributions published after September 28. Take a look at this video uploaded on October 9. Notice that the Caption author is Dark_Kuroh, different from the video uploader.

But how, time travel? As it so happens, even with all the menus hidden, it is still possible to access the old captions editor. This method requires the uploader to know where to check, so it’s best that if you are submitting captions or translations using this method, you let the uploader know the language and the video.


How to submit community captions

So you want to caption or translate someone else’s video… Assuming that the channel still has community captions enabled, go to the following URL:

youtube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}

Example: http://youtube.com/timedtext_editor?action_mde_edit_form=1&v=vCxz2lSeer4&lang=en

where {video code} is the end of the video’s id and {language code} the abbreviation for the language you want to translate into. Fortunately, you can also later switch between languages, so if you don’t know the abbreviation you can use en to start open the editor for English and then switch to your actual language through the Switch Language button.

When you’re done, don’t forget to submit by clicking on the Submit Contributions button in the upper right corner.


How to accept community captions

Previously, you were able to view community submissions from the Community Tab on YouTube Studio. Unfortunately, these are now hidden. So you will need to have an idea of which videos and languages to check.

If you hadn’t enabled community contributions before it’s not too late! Just simply go into YouTube Studio > Videos and choose the videos you would like to enable contributions on. Go into Edit > Community Contributions and switch it on. Lastly, don’t forget to click on “Update Videos”.

You, as an uploader, can also theyoutube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}to access the caption/translation submissions you have received. A good place to start from could be some of your most viewed videos, and you should definitely pay attention to your subscribers to see if they are trying to tell you to accept any of their submissions.

All you have to do when you do find a community submission is to click on the Publish or Publish edits button on the upper right corner,


While YouTube is still working on their permissions system and the community is banding together to find alternatives of their own, it’s important to endure through this transition period. So here’s hoping this tutorial helps you continue to add/receive translations on your videos for a little longer…

]]>
https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/feed/ 2