Get Your Data
tool to download their messages and other data, prior to the shutdown, but many people were unable to respond on short notice.
Thankfully, owing to the efforts of the Save Yahoo Groups Project and Archive Team the data of many groups has been preserved. If you missed out on the GYD tool, you might still be able to retrieve your groups’ data by following the steps below.
To begin, can you remember your group’s name? If yes, the following steps will go by a lot faster; but if not, you might want to make a list of potential names to go by. Was the name of your group Fireflylovers
, or Firefliers
, or LoversofFF
? Write down all likely candidates.
For demonstration’s sake let’s search for data on NFforKids
, a non-fiction writing group.
Let’s perform a metadata search, to see when NFforKids
was started. Head over to the Yahoo Groups Metadata Collection page on the Internet Archive. Ignoring the no preview warning, either click on Show all files
or scroll down until you see DOWNLOAD OPTIONS
on the right side of the page.
Click on COMMA-SEPARATED VALUES
, to reveal a list of files. Since NFforKids
starts with an N
, if it does exist, it will be indexed under master_N.csv
. Download this CSV file to your device.
You can now open this CSV file using Excel or another spreadsheet program. Search for NFforKids
to find the corresponding information row. What do you know? NFforKids
was started on 11 June 2000. You can scroll accross this row to find the group’s primary language, the category of the group, if the group was public or not, and more!
If you weren’t able to find metadata on your group, it’s time to pull up that list I told you to make above. Fall back to the other candidates and try another name. If the first letter (or two) of this second name is different, you will need to download the corresponding CSV file before resuming your search.
Please note that while the Yahoo! Groups collections on the Internet Archive are thorough, they are NOT exhaustive. It is entirely possible that data on your group might have been missed. That being said the metadata collection sports a whopping 1.1 million groups. Even if you weren’t able to find your group in the first round, it is very likely that you may have misremembered the name, so keep on trying!
Once you have confirmed the name of your group, and that it has been catalogued in the Metadata Collection, you can then download the corresponding TAR file, which contains even more details. Again, if we’re looking for a group called NFforKids
we’ll be looking for the first two letters from the list. That’s NF.tar
for NFforKids
.
If you’re on Mac or Linux, you should be able to open this .tar file to reveal a folder titled media
. If you’re on Windows, you can use 7-zip to open it. This TAR file contains the same information as the CSV, plus additional details. Did the group have spam filtering, was media sharing allowed or was the group text-only? You might even find the URL for group images, although unfortunately most of those links are now dead.
Stats are fine and dandy, but what about messages or activity? If your group was restricted, tough luck, you’ll need to find a member who made a GYD copy before the shutdown. This is where our luck with NFforKids
has run out, seeing as chats of the group were not public. For the final step, let’s switch to a public group whose history is visible. We’ll go with nfwritersontheirwayup
. Messages in this group were visible to all subscribers, so archivists were able to grab its contents.
Raw data collections are stored in assorted, non-alphabetic, batches. To see if a group has its raw data available on the Internet Archive, simply query subject:"yahoo groups" nfwritersontheirwayup
. If you get any results, your group’s raw data is most likely located here. You can double check the item description to be sure that nfwritersontheirwayup
is indeed included in the batch.
Pop open the WEB ARCHIVE GZ
download option from the left side of the page. Scroll down until you see nfwritersontheirwayup.bcqkJvN.warc.gz
and proceed to download. To unpack this gzip you can use thegzip -d nfwritersontheirwayup.bcqkJvN.warc.gz
command on Unix systems or good old 7-zip on Windows.
Last but not least, you’ll need a WARC viewer. If this is your first time with WARCs replayweb.page is very straightforward and runs right out of your browser. Simply upload the WARC contents of the group and voila, you can now navigate through the group’s chat logs.
Recovering your Yahoo! Groups from yesteryear is as simple as that. Got any questions? Or perhaps you have made some worthwhile discoveries while group hunting. Comment below!
]]>The story did not end there however. So let’s talk about what has transpired since…
Despite us even reporting 30 January as the final deadline, Yahoo continued to accept Get My Data (GMD) requests for about a week. So active efforts ceased around that time. Now was the waiting game, as it took a few more weeks for some of those GMD requests to process.
By late February, most of the volunteers had disbanded or moved onto other projects. But there was still much to be done. For one thing, people had rushed so much to grab everything that they could, that a lot of these group files were a total mess, not made any better by how Yahoo’s GMD exports worked. So the remaining volunteers stuck around to label their massive collection.
Doranwen, one of the leads on the Yahoo-Geddon (aka Save Yahoo Groups) project, frequently documented their progress during this time.
A few numbers and random other bits of info:
~2 TB of fandom data saved (that I know of, for now)
~200,000 confirmed fandom groups saved in some fashion
~2,000 Sims groups saved* …*The only reason I know the Sims number is because I was tracking those groups on Google spreadsheets in order to find all of them and get volunteers to join them. For other fandoms it’s impossible to give any sort of number at this point (although I know there was a ton of LOTR, HP, Buffy, and Westlife, lol). Yahoo’s categorization was terrible and a group name doesn’t always give good clues as to whether it’s fandom/non-fandom. Getting that sort of data will take a good deal of time and work.
Doranwen, The end of Yahoo Groups – a few thoughts & stats
Another issue was that the collection was not actually unified. Archive Team had also archived a bunch of data, so the Yahoo-Geddon team continued to label those batch by batch for a few more months.
It truly is endless!!
Yahoo-Geddon volunteer, 14 July 2020
Yet another reason the Yahoo-Geddon team was taking so long was because of how meticulous they were. They worked to not only curate this collection for the sake of archiving, not only to trace the history of fandom, but also to be able to provide a rich dataset that researchers might want to use in the future.
-[Stage] 4.5b: Remember that we got a bunch of groups from scrounging the links of other groups for new groups to join? Some of the commands used to process that data generated “groups” that never existed (with http: stuck at the end, apostrophes or commas in them, etc.). Also one stage of the spreadsheet work ended up with a certain number of groups getting a duplicate version added to the spreadsheet with _dupe after the name.
So for this stage I send the spreadsheets to my assistant who runs a script against them to find groups with punctuation in them or _dupe at the end. A very very tiny number of very old (grandfathered from who knows which list service) groups actually legitimately have periods in their names, but in most cases groups with periods never existed either.
This process is fairly quick for each letter but varies greatly in what has to be done, as sometimes group folders are affected (and some punctuation marks Yahoo simply ignored everything from that mark onwards and treated the letters before it as a group name).
Yahoo Groups metadata processing steps, stage 4.5b
Sadly, Yahoo!, blind as ever to Yahoo-Geddon’s efforts, have decided to permanently shut down Yahoo Groups. While Yahoo Groups only retained its bare-bone features, this will be putting an end to some decade-old mailing lists…
On a related note, an interesting discovery Yahoo-Geddon made is that Yahoo actually has not deleted archives, photos and files but only removed public access.
The files are still there, from what I can tell! They’ve just blocked us from getting to them.
The monthly reminder emails with attachments are still coming in – and the attachments come from files in the files sections. Clearly those were never removed.
Which means that Yahoo could have chosen to grant us access to all of that for a full year before closing Groups entirely, but did not.
via the Save Yahoo Groups Discord server
Just goes to show that curation is the one half of archiving/preservation… If you would like to learn more or even participate in Yahoo Group dissection, check out the Save Yahoo Groups discord server: https://discord.com/invite/DyCNddf
]]>What do they do?
First of all IA works to digitize new material, such as books or VHS tapes that probably haven’t made it on the internet yet.
They host a number of collections, which are often curated by libraries or educational institutions such as the New York Public Library and the University of Toronto .
And then there’s the Wayback Machine which started it all! It allows you to capture snapshots of webpages. Hence the name it works like a time machine, allowing you to view past versions of websites or even those which are no longer online.
They also have a whole bunch of other projects, including one which allows users to borrow rare books from libraries and keep a 14-day e-book version. See https://archive.org/projects/ for more information.
How do they do it?
Although it may come as a surprise, the internet archive has a physical location. The physical (books and similar materials) and virtual archives (servers and digitization equipment) are located inside of a former Christian Science church.
Most of their work comes out of here, although they are known to often collaborate with other libraries/archives or acquire collections from different collectors.
How do I sign up?
If you would like to work for the Internet Archive at their physical location you could check out https://archive.org/about/jobs.php.
That being said, anyone can browse the archives* and you can start an account if you’d like to upload items of your own from anywhere in the world**. You heard right! All you have to do to contribute to the Internet Archive is sign up right from the comfort of your home.
So what are you waiting for? Become an Internet Archiver today! https://archive.org/account/signup
*Browsing certain material (generally sensitive or graphic content) might require you to sign in with a registered account.
**Using the WaybackMachine to make captures of websites won’t require a registered account.
Looking to discover other archiving communities? Just follow Data Horde’s Twitter List and check out our other Community Spotlights.
]]>Who are they?
If you’re here right now, chances are you’ve heard of the name “Archive Team” before. They might not be the largest internet archiving group, but they are certainly the most influential.
What do they do?
Save your stuff. You are everyone, from links on ancient forums to news reports people will forget, to music videos on dying platforms. Archive Team mostly focusses on extracting web content, often outsourcing its later distribution to the Internet Archive. If a website is reported to be shutting down some time soon, it’ll only be a matter of time before they catch wind of it.
How do they do it?
For most cases they have a standard solution, which anyone can download, known as the Warrior (2). It works by downloading website contents from a website that might not be able to maintain its content (1) to a virtual machine. This content is then passed to a Tracker (3) server which keeps track of what is collected and what else is to be collected. These are then sent to Servers (4) run by dedicated volunteers from the Archive Team, for temporary storage. The final destination will usually be the Internet Archive (5) once the content goes offline for good.
How do I sign up?
Archive Team is entirely composed of volunteers. Although members maintain a small presence on Twitter and Discord, their main hub is https://archiveteam.org, a wiki where they keep track of ongoing projects and have links to resources such as the aforementioned Warrior. For more “real-time” communication you can find them on the #archiveteam channel on EFnet:
http://chat.efnet.org:9090/?channels=%23archiveteam
Looking to discover other archiving communities? Just follow Data Horde’s Twitter List and check out our other Community Spotlights.
]]>(Image taken from: https://www.archiveteam.org/index.php?title=Yahoo!_Groups)
Still, many fan communities traced their origins to the mailing lists, with older members sometimes recounting terms, stories or jokes that originated in those days to the newer members. It’s safe to say that these groups left behind quite a legacy– which Verizon (Media) recently decided to wipe off the face of the earth.
As far as I know, there has been no formal archiving project for fandom Yahoo Groups prior to this. During the time that Yahoo Groups was most active, there were fan fiction archives that sometimes duplicated what was at Yahoo Groups. But an enormous amount of fandom content at Yahoo Groups has never been archived.’
The Yahoo Groups Story is a fine tale which shows how different teams with complementing abilities and backgrounds can work together to accomplish things neither could have done as good on their own. If you too would like to become a part of this story, you can head on over to the Discord server and see if you can reach any of the owners that they’re looking for.