yahoo – Data Horde https://datahorde.org Join the Horde! Tue, 18 Jan 2022 18:19:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png yahoo – Data Horde https://datahorde.org 32 32 This Week In Archiving 07/05/2021 https://datahorde.org/this-week-in-archiving-07-05-2021/ https://datahorde.org/this-week-in-archiving-07-05-2021/#respond Mon, 05 Jul 2021 22:16:45 +0000 https://datahorde.org/?p=2464 Nexus Mods introduces mandated archiving for long-time storage of game mods, the scramble for unlisted videos rages on and VG preservationists manage to read a rare 2002 e-Reader card!


Shutdowns

Webs logo

Webs, the once popular website host, was expected to shut down after 20 years, at the end of June. Fortunately, Archive Team is on the case, and double fortunately it seems that sites are still up, even today on 5 July. Archive stats available here.

Microsoft’s Codeplex archive was expected to shut down some time in July. I am happy to report that Codeplex has already been fully scanned and Archive Team’s collection is available here.

Unlisted YouTube Video Scramble

With 18 days to go before YouTube forcibly privates old unlisted videos and playlists on July 23 many archiving projects are being run concurrently.

Chart: Unlisted YouTube Video Scramble Status (5 July 2021); link crawlers are on the left, and video/metadata archivers on the right.

There has been much activity on certain subreddits for sniffing out links to unlisted videos. These include r/speedrun and speedrun.com collecting unlisted speedruns, r/nerdfighters backing up unlisted Vlogbrothers content, r/homestuck chasing after fan videos, besides the usual suspects from r/datahoarder.

Our fellow archivist Jopik, recently published his own unlisted video collection on filmot.com. Privately collected from various corners of the internet over the past 2-3 years, this collection contains links and metadata on several million unlisted YouTube videos. Members of Flashpoint and The Eye have taken note of the collection and have began archiving video files of interest.

Finally Archive Team is working on a major project for collecting metadata for unlisted video links on hackint#down-the-tube and a minor project for saving notable videos on hackint#youtubearchive. Some of Archive Team’s unlisted videos are being provided by Sponsorblock which continues to ask users to submit unlisted video URLs.

Forthcoming projects include Omniarchive‘s rush to grab unlisted Minecraft footage and the Distributed YouTube Archive prioritizing a queue of unlisted videos…

Other Updates

The popular modding community Nexus Mods has taken a decision to make permanent archives of all mods hosted on their site. While the decision has been celebrated by web preservationists the world over, it has also drawn ire from some modders who are upset that they have lost the freedom to delete their mods, as reported by Kotaku.

This change is intended to lay the groundwork for Nexus’ upcoming collections system, which will allow modders to combine different mods for the same game. Obviously, if the dependencies of a collection were deleted the entire collection could break, left-pad style. Community manager BigBizkit made it clear that as part of Nexus’ “noble” mission to make modding easier, it was essential that modders be able to build upon previous work.

Let me stress that even without collections in the picture, file deletions and disappearing data constitute a problem and create a development environment that cannot serve as a strong foundation for the future of our platform. 

BigBizkit, An important notice and our future plans for collections

As a compromise for those not fond of the change, modders have a month-long grace period to delete any files/mods that they do not want indefinitely preserved.

Yahoo! Answers might have shut down a couple of months ago, but a substantial portion of a former hallmark of the web was saved. Unfortunately said massive collection is not very human-readable. There are talks on hackint#yahooanswersgraveyard to build a “Yahoo! Answers Archive” search engine akin to the YouTube Community Contributions search tool. Get hyped and stay tuned!

Discoveries

Hit Save! and the e-Reader shared a number of discoveries over the past few weeks. Notably, they managed to acquire a rare 2002 Battle Road trophy card and play it on an e-Reader.

You ask what is a Battle Road trophy card? These “trophies” were awarded at official Pokémon TCG tournaments held across Japan. These cards came in several flavors, with a male/female trainer, with different pokémon in background and even what the winner had placed in the tournament they had earned the card.

Though serving no gameplay effect in the card game, these cards were readable by Nintendo’s now-defunct e-Reader GBA extension. Card Capto- (ahem) Collector Qwachansey was kind enough to send a card he had acquired to the folks at Hit Save, and now for the first time ever you can see the funny little message you got for scanning these trophy cards.

The full blog post has a lot of other interesting details on how the team built a modified e-Reader to be able to capture footage, as well as some other interesting cards they have come into possession of recently. Read about the The Ren-e-ssance for yourself here!

]]>
https://datahorde.org/this-week-in-archiving-07-05-2021/feed/ 0
How to recover your Yahoo! Groups from the Internet Archive https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/ https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/#comments Mon, 31 May 2021 14:13:40 +0000 https://datahorde.org/?p=2293 Yahoo! Groups, once upon a time a hub to many online communities, was shut down in 2020. Yahoo! Groups used to host mailing lists going as far back as 1997, and perhaps you may have once been a part of it yourself. Users were offered a Get Your Data tool to download their messages and other data, prior to the shutdown, but many people were unable to respond on short notice.

Thankfully, owing to the efforts of the Save Yahoo Groups Project and Archive Team the data of many groups has been preserved. If you missed out on the GYD tool, you might still be able to retrieve your groups’ data by following the steps below.


To begin, can you remember your group’s name? If yes, the following steps will go by a lot faster; but if not, you might want to make a list of potential names to go by. Was the name of your group Fireflylovers, or Firefliers, or LoversofFF? Write down all likely candidates.

For demonstration’s sake let’s search for data on NFforKids, a non-fiction writing group.

Let’s perform a metadata search, to see when NFforKids was started. Head over to the Yahoo Groups Metadata Collection page on the Internet Archive. Ignoring the no preview warning, either click on Show all files or scroll down until you see DOWNLOAD OPTIONS on the right side of the page.

Click on COMMA-SEPARATED VALUES, to reveal a list of files. Since NFforKids starts with an N, if it does exist, it will be indexed under master_N.csv. Download this CSV file to your device.

You can now open this CSV file using Excel or another spreadsheet program. Search for NFforKids to find the corresponding information row. What do you know? NFforKids was started on 11 June 2000. You can scroll accross this row to find the group’s primary language, the category of the group, if the group was public or not, and more!

If you weren’t able to find metadata on your group, it’s time to pull up that list I told you to make above. Fall back to the other candidates and try another name. If the first letter (or two) of this second name is different, you will need to download the corresponding CSV file before resuming your search.

Please note that while the Yahoo! Groups collections on the Internet Archive are thorough, they are NOT exhaustive. It is entirely possible that data on your group might have been missed. That being said the metadata collection sports a whopping 1.1 million groups. Even if you weren’t able to find your group in the first round, it is very likely that you may have misremembered the name, so keep on trying!


Once you have confirmed the name of your group, and that it has been catalogued in the Metadata Collection, you can then download the corresponding TAR file, which contains even more details. Again, if we’re looking for a group called NFforKids we’ll be looking for the first two letters from the list. That’s NF.tar for NFforKids.

If you’re on Mac or Linux, you should be able to open this .tar file to reveal a folder titled media. If you’re on Windows, you can use 7-zip to open it. This TAR file contains the same information as the CSV, plus additional details. Did the group have spam filtering, was media sharing allowed or was the group text-only? You might even find the URL for group images, although unfortunately most of those links are now dead.

The Cover for the Star Trek: New Frontier Fanfiction group, one of few group covers preserved in the metadata collection.

Stats are fine and dandy, but what about messages or activity? If your group was restricted, tough luck, you’ll need to find a member who made a GYD copy before the shutdown. This is where our luck with NFforKids has run out, seeing as chats of the group were not public. For the final step, let’s switch to a public group whose history is visible. We’ll go with nfwritersontheirwayup. Messages in this group were visible to all subscribers, so archivists were able to grab its contents.

Raw data collections are stored in assorted, non-alphabetic, batches. To see if a group has its raw data available on the Internet Archive, simply query subject:"yahoo groups" nfwritersontheirwayup. If you get any results, your group’s raw data is most likely located here. You can double check the item description to be sure that nfwritersontheirwayup is indeed included in the batch.

Pop open the WEB ARCHIVE GZ download option from the left side of the page. Scroll down until you see nfwritersontheirwayup.bcqkJvN.warc.gz and proceed to download. To unpack this gzip you can use thegzip -d nfwritersontheirwayup.bcqkJvN.warc.gz command on Unix systems or good old 7-zip on Windows.

Last but not least, you’ll need a WARC viewer. If this is your first time with WARCs replayweb.page is very straightforward and runs right out of your browser. Simply upload the WARC contents of the group and voila, you can now navigate through the group’s chat logs.


Recovering your Yahoo! Groups from yesteryear is as simple as that. Got any questions? Or perhaps you have made some worthwhile discoveries while group hunting. Comment below!

]]>
https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/feed/ 28
Help Archive Team Save Yahoo! Answers! https://datahorde.org/help-archive-team-save-yahoo-answers/ https://datahorde.org/help-archive-team-save-yahoo-answers/#comments Thu, 22 Apr 2021 02:35:47 +0000 https://datahorde.org/?p=2207 Yahoo! Answers is shutting down on May 4th, 2021, taking nearly 15 years worth of content with it!

Archive Team is trying to save as much of it as possible, and you can help!

By setting up the Archive Team Warrior and letting it run in the background, you can back up questions and answers from Yahoo! Answers and make them available in the Internet Archive Wayback Machine. The Archive Team Warrior is easy to set up and uses very few of your system resources. The Archive Team Warrior can work on up to 6 items concurrently.

Advanced users can also run the project with Docker using the atdr.meo.ws/archiveteam/yahooanswers-grab Docker image, which can easily be deployed on large networks and allows for running projects at a higher concurrency rate per container (maximum 20 concurrent items, though users running the project with this many concurrent items might be rate-limited by Yahoo!).

If you need any help or have any questions about the project, please feel free to refer to the project page on the Archive Team Wiki or ask in Archive Team’s IRC channel for the Yahoo! Answers project. (Please be patient and stay connected if your question isn’t immediately answered so you don’t miss any responses.)

]]>
https://datahorde.org/help-archive-team-save-yahoo-answers/feed/ 1
Why Do We Need Proactive Archiving? Yahoo! Answers https://datahorde.org/why-do-we-need-proactive-archiving-yahoo-answers/ https://datahorde.org/why-do-we-need-proactive-archiving-yahoo-answers/#respond Sat, 10 Apr 2021 22:18:47 +0000 https://datahorde.org/?p=2192 In 2016, a rumor was circulating that Yahoo! Answers might be shutting down. Rumor or not, this was once the most popular Q&A site on the internet, so archivists did not take any chances. Soon after, Archive Team sprung into action, grabbing over 30 TBs worth of data. This included questions and answers in various languages, posted between 2005-2016.

Fortunately, by 2017 it became clear that such a shutdown would not be happening and the internet breathed a sigh of relief…


Come April 2021 when Yahoo! announces that Answers actually will be shutting down within a month. The hastily published FAQ page tells users that the site will go read-only by 20 April, but does not give any reasons for the shutdown. The closest thing to an explanation for the is an excerpt in an e-mail sent to registered members, as reported by The Verge:

While Yahoo Answers was once a key part of Yahoo’s products and services, it has become less popular over the years as the needs of our members have changed. To that end, we have decided to shift our resources away from Yahoo Answers to focus on products that better serve our members and deliver on Yahoo’s promise of providing premium trusted content.

Had things gone differently, we might have been facing a much more grim situation, trying to cram 16 years of web history into a few weeks. Conveniently, archivists are not empty-handed, with the 2016 grab already under Archive Team’s belt. Not to mention that the scripts used for the previous grab were also ready to use after some retooling.

To grab the 5 years worth of content in between, Archive Team has already set up a project into motion. Archive Team have already grabbed a few hundred gigs worth of data and you can follow the project status on #noanswers on HackInt.


Yahoo! Answers was a forum where many questions found their answers throughout the Web 2.0 era. And now, one final question Yahoo! Answers, is what merit there is to proactive archiving. Even a 15 year-old website may only have as short as a month between announcement and shutdown. “Today it is here, tomorrow it will stay”, is no longer a healthy assumption in this volatile age of the web. We need backups before the warning siren.

So let us smile at this happy ending as Yahoo! Answers will trustfully be preserved. Now if you excuse me, it’s time for me to brush up on Ouija Boards.

]]>
https://datahorde.org/why-do-we-need-proactive-archiving-yahoo-answers/feed/ 0
Yahoo! Groups Archive Metadata Now Available https://datahorde.org/yahoo-groups-archive-metadata-now-available/ https://datahorde.org/yahoo-groups-archive-metadata-now-available/#comments Sun, 06 Dec 2020 13:40:00 +0000 https://datahorde.org/?p=1849 After months of work and preparation, the metadata for over 1.1 million Yahoo! Groups retrieved by Archive Team’s Python script as well as from other grabs has been organized and is now available on the Internet Archive. Special thanks to Doranwen for organizing this data.

Yahoo! Groups’ mailing lists, which are the last remaining part of Yahoo! Groups, will be shutting down in 10 days, on December 15, 2020. However, since group content is no longer accessible to the public, there is little left to archive.

Next year, volunteers will be needed to sort and organize the full group data so related groups can be uploaded to the Internet Archive together. This will make it easier to access and browse archives for multiple groups related to similar topics.

For more information about Yahoo! Groups, please see Doranwen’s blog or our Yahoo! Groups articles.

]]>
https://datahorde.org/yahoo-groups-archive-metadata-now-available/feed/ 10
October Status Update on the Save Yahoo Groups! Project https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/ https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/#respond Thu, 15 Oct 2020 23:00:35 +0000 https://datahorde.org/?p=1631 Last November, Yahoo announced that they would be shutting down many key features on the ancient Yahoo Groups. There was a major project to rescue data, lead by Archive Team and fandoms who traced their origins to Yahoo Groups. In fact we had written all about it back in January:

The story did not end there however. So let’s talk about what has transpired since…


Despite us even reporting 30 January as the final deadline, Yahoo continued to accept Get My Data (GMD) requests for about a week. So active efforts ceased around that time. Now was the waiting game, as it took a few more weeks for some of those GMD requests to process.

By late February, most of the volunteers had disbanded or moved onto other projects. But there was still much to be done. For one thing, people had rushed so much to grab everything that they could, that a lot of these group files were a total mess, not made any better by how Yahoo’s GMD exports worked. So the remaining volunteers stuck around to label their massive collection.

Doranwen, one of the leads on the Yahoo-Geddon (aka Save Yahoo Groups) project, frequently documented their progress during this time.

A few numbers and random other bits of info:

~2 TB of fandom data saved (that I know of, for now)
~200,000 confirmed fandom groups saved in some fashion
~2,000 Sims groups saved* …

*The only reason I know the Sims number is because I was tracking those groups on Google spreadsheets in order to find all of them and get volunteers to join them. For other fandoms it’s impossible to give any sort of number at this point (although I know there was a ton of LOTR, HP, Buffy, and Westlife, lol). Yahoo’s categorization was terrible and a group name doesn’t always give good clues as to whether it’s fandom/non-fandom. Getting that sort of data will take a good deal of time and work.

Doranwen, The end of Yahoo Groups – a few thoughts & stats

Another issue was that the collection was not actually unified. Archive Team had also archived a bunch of data, so the Yahoo-Geddon team continued to label those batch by batch for a few more months.

It truly is endless!!

Yahoo-Geddon volunteer, 14 July 2020

Yet another reason the Yahoo-Geddon team was taking so long was because of how meticulous they were. They worked to not only curate this collection for the sake of archiving, not only to trace the history of fandom, but also to be able to provide a rich dataset that researchers might want to use in the future.

-[Stage] 4.5b: Remember that we got a bunch of groups from scrounging the links of other groups for new groups to join? Some of the commands used to process that data generated “groups” that never existed (with http: stuck at the end, apostrophes or commas in them, etc.). Also one stage of the spreadsheet work ended up with a certain number of groups getting a duplicate version added to the spreadsheet with _dupe after the name.

So for this stage I send the spreadsheets to my assistant who runs a script against them to find groups with punctuation in them or _dupe at the end. A very very tiny number of very old (grandfathered from who knows which list service) groups actually legitimately have periods in their names, but in most cases groups with periods never existed either.

This process is fairly quick for each letter but varies greatly in what has to be done, as sometimes group folders are affected (and some punctuation marks Yahoo simply ignored everything from that mark onwards and treated the letters before it as a group name).

Yahoo Groups metadata processing steps, stage 4.5b

Sadly, Yahoo!, blind as ever to Yahoo-Geddon’s efforts, have decided to permanently shut down Yahoo Groups. While Yahoo Groups only retained its bare-bone features, this will be putting an end to some decade-old mailing lists…

On a related note, an interesting discovery Yahoo-Geddon made is that Yahoo actually has not deleted archives, photos and files but only removed public access.

The files are still there, from what I can tell! They’ve just blocked us from getting to them.

The monthly reminder emails with attachments are still coming in – and the attachments come from files in the files sections. Clearly those were never removed.

Which means that Yahoo could have chosen to grant us access to all of that for a full year before closing Groups entirely, but did not.

via the Save Yahoo Groups Discord server

Just goes to show that curation is the one half of archiving/preservation… If you would like to learn more or even participate in Yahoo Group dissection, check out the Save Yahoo Groups discord server: https://discord.com/invite/DyCNddf

]]>
https://datahorde.org/october-status-update-on-the-save-yahoo-groups-project/feed/ 0
Saving Private Groups: This Time the Mission is the Fan https://datahorde.org/saving-private-groups-this-time-the-mission-is-the-fan/ https://datahorde.org/saving-private-groups-this-time-the-mission-is-the-fan/#comments Fri, 31 Jan 2020 04:19:00 +0000 In a little more than 24 hours Yahoo Groups will be biting the dust, if you were once a member of a community or perhaps own and/or know an owner of a restricted or private group which you’d like to save I urge you to contact [email protected] or join the “Save Yahoo Groups” Discord Server: https://discord.gg/DyCNddf . Without further ado, the Fandom Rescue Story…
 
Ah, the late 90’s! A time with dial-up internet, people ranting about wasting too much time in front of their televisions instead of on their phones, and this book called Harry Parter about witches and wizards or something. It was a different time in many ways and it’s a bit frightening how fast things have changed considering how chronologically recent it was. And yet, some things were quite similar, but the way one went about doing them was kind of different. Take for instance what you would do in your free time…
 
So you want to socialize online in 1997 huh? Unfortunately you don’t really have anything like Twitter or Reddit; maybe you could go on Usenet or IRC, if you don’t mind having an unreliable chat log or none at all. Then why not join a mailing list? It’s perfect for discussing lizards with herpetology nerds or sharing your Thundercats fan-fiction with fans of the show.
 
Yahoo group writing children's non fiction | Essay writing ...
This age, the age of the mailing list was a time that many online communities flourished, particularly fandom. What had previously been restricted to (maga)”zines” and conventions had finally started to gain traction online. A mailing list at the time was a luxury akin to Slack/Discord servers of today, you could start a group with people who had a common interest without having to go through the tedious process of setting up new hardware, instead people got messages and notifications delivered straight to their inboxes. This rapid notification system allowed for communities that would no longer “sleep”, you would have updates almost 24/7. And as one can imagine, having such a party that never ends was incredibly addictive, though word-of-mouth mailing list providers such as OneList, EGroups and finally Yahoo! Groups soared in popularity.
 
“Our Group has been chosen to participate in the Yahoo! Groups Beta Program and all of the features that Yahoo! is contemplating have been incorporated into our Group. Threads can now be linked as Conversations and are searchable. Posting photos and links is now much easier.”
– Post on a group blissfully unaware of their eventual demise
 
 
Time however was cruel to the mailing list, the last giant to survive the era was the aforementioned Yahoo! Groups which tried to modernize with its web interface but was unable to keep up with the rapid growth of technology at the time. Eventually users began migrating to newer websites, and by 2015 the website resembled a ghost town.

 
Yahoo groups post date.png

(Image taken from: https://www.archiveteam.org/index.php?title=Yahoo!_Groups)

Still, many fan communities traced their origins to the mailing lists, with older members sometimes recounting terms, stories or jokes that originated in those days to the newer members. It’s safe to say that these groups left behind quite a legacy– which Verizon (Media) recently decided to wipe off the face of the earth.

 
In mid-October of 2019, it was announced that Yahoo Groups would be shut down, what followed was outrage. Although it was the true that most of the former user base of Yahoo Groups had indeed moved on to other platforms, members of the early online fandom community did see what was at stake and were some of the first people to spring into action.
 
On October 22nd 2019 Tumblr user zhie started a Discord Server “Save Yahoo Groups” (link above), the same day Morgandawn started a Tumblr blog: https://yahoo-geddon.tumblr.com/. These two outlets combined together to form Fandom’s Sortie against Verizon’s Yahoo Groups Siege.
 
‘…People here have been doing massive numbers of searches for fandom groups. Of course, some of us already belonged to fandom groups, and in some cases we have people coming in, saying, “These are really great groups that I think should be saved.”

 

As far as I know, there has been no formal archiving project for fandom Yahoo Groups prior to this. During the time that Yahoo Groups was most active, there were fan fiction archives that sometimes duplicated what was at Yahoo Groups. But an enormous amount of fandom content at Yahoo Groups has never been archived.’

– Yahoo Groups Archiving Volunteer 
 
While Archive Team had also gotten involved right off the bat, they stated their goal to be grabbing as many public groups as possible. Whereas the fandom community wanted to ensure the survival of their groups, some of which had restricted access (were publicly visible but an invite was needed to join) or were private (not publicly visible).
 
The two teams worked in tandem; with Archive Team providing tools and logistics for backing up the data, and the SYG team which worked to sniff out the more obscure fandom groups and establish contacts with the restricted/private group owners. Of course both teams played a tremendous role in publicizing the whole event, even managing to secure an extension for people to get more time for backing up their data.
 
Both Archive Team and the SYG team made a number of group lists for groups which they found, again keeping their own focus Archive Team set out to grab the data from their public groups lists and the SYG team split their group list into tabs, which volunteers would claim and try to get access into. 
 
Many hours of searching, exchanging mails and sleepless nights later and TB’s of data have been rescued from certain destruction. Archive Team’s own Tracker reports 2.76 TB of data to have been saved. The SYG team hasn’t fully tallied up their data yet, but have counted the number of groups they’ve retrieved and/or are retrieving from to be around 123K!

The Yahoo Groups Story is a fine tale which shows how different teams with complementing abilities and backgrounds can work together to accomplish things neither could have done as good on their own. If you too would like to become a part of this story, you can head on over to the Discord server and see if you can reach any of the owners that they’re looking for.

 
 
 
 
 
 
 
 
]]>
https://datahorde.org/saving-private-groups-this-time-the-mission-is-the-fan/feed/ 5