metadata – Data Horde https://datahorde.org Join the Horde! Mon, 31 May 2021 14:13:48 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png metadata – Data Horde https://datahorde.org 32 32 How to recover your Yahoo! Groups from the Internet Archive https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/ https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/#comments Mon, 31 May 2021 14:13:40 +0000 https://datahorde.org/?p=2293 Yahoo! Groups, once upon a time a hub to many online communities, was shut down in 2020. Yahoo! Groups used to host mailing lists going as far back as 1997, and perhaps you may have once been a part of it yourself. Users were offered a Get Your Data tool to download their messages and other data, prior to the shutdown, but many people were unable to respond on short notice.

Thankfully, owing to the efforts of the Save Yahoo Groups Project and Archive Team the data of many groups has been preserved. If you missed out on the GYD tool, you might still be able to retrieve your groups’ data by following the steps below.


To begin, can you remember your group’s name? If yes, the following steps will go by a lot faster; but if not, you might want to make a list of potential names to go by. Was the name of your group Fireflylovers, or Firefliers, or LoversofFF? Write down all likely candidates.

For demonstration’s sake let’s search for data on NFforKids, a non-fiction writing group.

Let’s perform a metadata search, to see when NFforKids was started. Head over to the Yahoo Groups Metadata Collection page on the Internet Archive. Ignoring the no preview warning, either click on Show all files or scroll down until you see DOWNLOAD OPTIONS on the right side of the page.

Click on COMMA-SEPARATED VALUES, to reveal a list of files. Since NFforKids starts with an N, if it does exist, it will be indexed under master_N.csv. Download this CSV file to your device.

You can now open this CSV file using Excel or another spreadsheet program. Search for NFforKids to find the corresponding information row. What do you know? NFforKids was started on 11 June 2000. You can scroll accross this row to find the group’s primary language, the category of the group, if the group was public or not, and more!

If you weren’t able to find metadata on your group, it’s time to pull up that list I told you to make above. Fall back to the other candidates and try another name. If the first letter (or two) of this second name is different, you will need to download the corresponding CSV file before resuming your search.

Please note that while the Yahoo! Groups collections on the Internet Archive are thorough, they are NOT exhaustive. It is entirely possible that data on your group might have been missed. That being said the metadata collection sports a whopping 1.1 million groups. Even if you weren’t able to find your group in the first round, it is very likely that you may have misremembered the name, so keep on trying!


Once you have confirmed the name of your group, and that it has been catalogued in the Metadata Collection, you can then download the corresponding TAR file, which contains even more details. Again, if we’re looking for a group called NFforKids we’ll be looking for the first two letters from the list. That’s NF.tar for NFforKids.

If you’re on Mac or Linux, you should be able to open this .tar file to reveal a folder titled media. If you’re on Windows, you can use 7-zip to open it. This TAR file contains the same information as the CSV, plus additional details. Did the group have spam filtering, was media sharing allowed or was the group text-only? You might even find the URL for group images, although unfortunately most of those links are now dead.

The Cover for the Star Trek: New Frontier Fanfiction group, one of few group covers preserved in the metadata collection.

Stats are fine and dandy, but what about messages or activity? If your group was restricted, tough luck, you’ll need to find a member who made a GYD copy before the shutdown. This is where our luck with NFforKids has run out, seeing as chats of the group were not public. For the final step, let’s switch to a public group whose history is visible. We’ll go with nfwritersontheirwayup. Messages in this group were visible to all subscribers, so archivists were able to grab its contents.

Raw data collections are stored in assorted, non-alphabetic, batches. To see if a group has its raw data available on the Internet Archive, simply query subject:"yahoo groups" nfwritersontheirwayup. If you get any results, your group’s raw data is most likely located here. You can double check the item description to be sure that nfwritersontheirwayup is indeed included in the batch.

Pop open the WEB ARCHIVE GZ download option from the left side of the page. Scroll down until you see nfwritersontheirwayup.bcqkJvN.warc.gz and proceed to download. To unpack this gzip you can use thegzip -d nfwritersontheirwayup.bcqkJvN.warc.gz command on Unix systems or good old 7-zip on Windows.

Last but not least, you’ll need a WARC viewer. If this is your first time with WARCs replayweb.page is very straightforward and runs right out of your browser. Simply upload the WARC contents of the group and voila, you can now navigate through the group’s chat logs.


Recovering your Yahoo! Groups from yesteryear is as simple as that. Got any questions? Or perhaps you have made some worthwhile discoveries while group hunting. Comment below!

]]>
https://datahorde.org/how-to-recover-your-yahoo-groups-from-the-internet-archive/feed/ 28