digital preservation day – Data Horde https://datahorde.org Join the Horde! Tue, 03 Aug 2021 08:12:29 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png digital preservation day – Data Horde https://datahorde.org 32 32 A New Breed of Digital Archiving and Preservation https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/ https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/#respond Thu, 05 Nov 2020 23:05:42 +0000 https://datahorde.org/?p=1711 There it is, because someone thought it ought to be out there. Perhaps a story you read, a picture you saw or even a game you played… It was there because someone poured their heart and soul into making it, and it mattered.

Alas we find ourselves in an age where everyone collectively suffers from short-term memory loss. All that is ever on our minds is what’s relevant -here and now- and everything that is irrelevant is as good as imaginary.

The digital archivist or preservationist’s job is to, ultimately, save those things that matter. To that end they go to great extents; downloading terabytes of data, reverse engineering decade-old websites or even hunting down the source code for the most obscure software.

But it is not an easy job and it certainly is not getting any easier. Things are disappearing at too fast a rate for even the most attentive archivists to be able to keep up. Today, the digital archivist is fighting against currents they cannot overcome, with outdated wisdom. As it were, the archivist ought to find a way to swim with these currents, by taking advantage of tools and options that would have previously been unavailable to them.

In honor of Digital Preservation Day, I myself, as a digital archivist, would like to offer my own two cents for the next generation, that is to say a new breed of digital archivists. The three As: Adaptability, Acceptance and Acknowledgment.


Adaptability:

Keeping track of what is being retired, what websites are dying and mobilizing as quickly as possible!
flat lay shot of tools
Photo by Miguel Á. Padriñán on Pexels.com

This past September deserves to go down in history as Shutdown September, seeing how many websites were shut down that month. And honestly, archivists were barely able to keep up.

  • Archive Team, only got to work on archiving the massive Chinese social media site Tencent Weibo about 10 days after the shutdown announcement. While 248 TBs is an impressive feat, it’s only a fraction of the web content on Tencent Weibo. A lot more could have been grabbed if action was taken sooner.
  • A similar case was the shutdown of Naver Matome, a kind of Japanese tumbleblog. Despite the early shutdown announcement in July, Archive Team’s archival project only began months later in September, with only about a week to spare.
  • YouTube recently removed their community contributions feature and Data Horde started a project to save unpublished drafts. Although the feature removal was known for 2 months, it took us weeks to notice that drafts were at risk and even longer to note that YouTube had restricted the feature last year, leading to many drafts never being published. While we were lucky in that the drafts remained accessible for a month beyond the expected deadline, we might not have been as lucky.

Clearly, there is a need for a watchdog, or two, or three, to be able to inform preservation groups of websites which are closing down, or features being retired before the last minute. While tech news sites like the Verge might occasionally report on shutdowns, these are generally restricted to English websites. As for archivists, there are mainly three outlets:

  1. Archive Team’s Watchlist page.
  2. The Internet Archive Blog, if they are involved.
  3. And us, Data Horde, whenever we find out about a shutdown.

Other than that, it’s a matter of luck if a shutdown announcement makes it to the top page on Hacker News or Reddit. This is unreliable and we need to do something about it.

For starters, we need to not only monitor individual websites, but massive platforms like YouTube, Twitter and Reddit. We need our own unofficial open-documentations to note planned update changes when they are hinted at in tweets or in blogs, prior to official announcements, to be able to mobilize dynamically.

As for websites, especially non-English ones, we need to make it easier for non-archivists who are concerned to be able to reach out to us. Which brings us to the second A: Acceptance…


Acceptance:

Come as you are, we do not just need programmers and librarians, anyone can contribute to preservation and everyone should!
multicolored umbrella
Photo by Sharon McCutcheon on Pexels.com

The archiving community is, by and large, in favor of collaboration and open source. But currently this only applies among archivists and preservationists. Even with the source code and tools out in the open, the average person will most likely lack the technical knowledge to understand what the hey they’re looking at.

As difficult as it might be to admit, digital archiving is not very well known. Even the words preservation or conservation evoke ancient manuscripts, or endangered animals. Digital preservation is far from the first thing to come to mind and this degree of obscurity is not something to be desired.

Even if this obscurity comes with a sense of pride from the joy of being of a select few who know this craft, it has also come back to bite us. There is a good reason that a lot of people who discovered the Internet Archive, through their National Emergency Library experiment earlier this year, they weren’t the biggest of fans. Naturally, people were more inclined to trust the verdict of their favorite authors or speculate rather than to go and read what the NEL actually was and how it was justified.

If we, as preservationists, don’t promote our own work, why should other people? For all they know, we are all just rogue people with malicious intent. Then are we to grow old reclusively in our obscure hobby? Seeing how few of us archivists there are out there, I find it sad how many of us have made a name for ourselves as “grumpy old men”. We don’t have to be vagabonds, and we shouldn’t be. Because if we choose to alienate ourselves from the rest of the world, amateurs will develop their own archiving techniques to take our place.

  • The Save Yahoo Groups/Yahoo Geddon project was initially led by fandom. While Yahoo Groups might not mean a lot in 2020, many fan groups trace their origins to old mailing lists, some of which Yahoo had later acquired. When it was announced that all public groups would be privated, those people knew what was at stake; the history, the works, the memories of two decades.
    So they blindly charged in. None of them were proper digital preservationists, even if some members might have had an affinity for it. But they organized and developed their own method of hunting down and tracking downloads for ancient groups that the world had forgot.
  • Another similar project is BlueMaxima’s Flashpoint, a massive effort to preserve Flash and other multimedia web-content which will (or might) break in the future due to incompatibility. Initial volunteers to the project had some programming knowledge and were motivated to preserve games from their childhood. But they discovered that if they laid down a clear path, other people in a similar situation would be more than happy to contribute.
    They developed their own tools for downloading and curating games into their Flashpoint collection so volunteers would not have to start from scratch. And they did not shy away from letting their project be publicized.

Both of these projects have come a long way, in large part due to the sheer enthusiasm to initiate these projects and later due to extensive help from seasoned archivists who took note of these ideas and supported and nurtured them.

The bottom line is, we need to not shew people away, but embrace them. If an apprenticeship system seems too degrading at the very least we, as archivists, should take note from Save Yahoo Groups and Flashpoint when it comes to writing our tutorials and publicizing our projects.

We shouldn’t merely lurk on IRC chatrooms, we need to be able to reach the same people we’re trying to help, even if it’s on social media. DEF CON is cool, but wouldn’t it be nice if we could stand on our own two legs and had a convention of our own?

And when people come to us, asking for help, with no credentials whatsoever, we need to learn how to help. Which brings us to the third A: Acknowledgment.


Acknowledgment:

Understanding our circumstances and constraints, recognizing that we are not all on equal footing and that we might have very different goals and respecting one another.
photo of people near wooden table
Photo by fauxels on Pexels.com

It is entirely possible for a seasoned archivist to encounter someone -who is much less knowledgeable in archiving, or even programming- asking for help. And as someone who dedicates their time, to observe different archiving communities, I can acknowledge that every group has their own focus policy.

  • Mature archiving groups generally have a lot of tutorials and hardware they are more than happy to share, for their particular focus. Here, let’s acknowledge that every group has their own particular focus, information on a lost commercial is likelier to show up on the Lost Media Wiki, news on a delisted game is more Dead Game News territory.
  • Semimature archiving groups generally function as cliques, they have the expertise but are constrained by resources. Anyone expecting help from higher-ups within the group should first make a name for themselves, to prove that their ideas are worth the high council‘s time.
  • Some other groups are one-man armies, they find a very specific niche and document whatever they can. They might look to expand, or insist their archive is only for the fun of it.

So mature and small groups are comparatively stable, whereas semimature groups are considerably volatile. This is not really something we can change, so instead we should learn to acknowledge it. Some groups are always itching to lend a hand but other groups really won’t be willing to give outsiders, who have not proven their value and sincerity, the time of day.

Next we have to acknowledge expenses. Preservation is expensive, digital or otherwise. For some communities digitizing physical material is not easy, as the incomplete digitizations prior to the unfortunate fire at the National Museum of Brazil goes to show. It is a harsh reality that we do not receive the necessary subsidies from states, often relying on donations.

Again, mature groups are self-sustaining. For every member who retires, there will be new volunteers joining in. They also often have some form of donation system to maintain and expand their hardware. As for archiving groups which are still developing, they will have their ups and downs. A single Tweet or Reddit post can bring in tens of new people overnight. Conversely, it’s not uncommon for splinter groups to emerge if their focus diverges from the majority. Money and hardware might be supplied through donations, but again it could always be cut off. Most notably, The Eye has been experiencing a lot of downtime and struggling to cover costs these past few months. Of course, if you can’t cover the costs in the first place you’re better off avoiding anything ambitious.

As much as we archivists advocate for open access, it’s a harsh reality that we are more often not able to actually provide this. People outside of archiving circles have already discovered how to turn a profit out of archived material, through ad-revenue or paywalls. Take getdaytrends.com as an example, where you can look up Twitter trends per day, while they cover their costs through ads. Then perhaps, some of us smaller archiving groups could survive through such practices, until they can achieve a more mature stage.

But as a maxim: Access should be opened chronologically, so that nothing which has been opened before is closed off. (Ex. uploads for the last year are behind a paywall, but on 1 January 2021, uploads on 1 January 2020 go open access)

Let’s talk a bit about a community which actually is trying something similar to this, namely OldGameMags. OldGameMags was only a small group of people who’d met each other over the internet and had been pooling resources to gather old game magazines. And they were barely receiving any donations. So they came up with a clever idea: To gain access you either had to make a one time donation, or actually send in a magazine to be scanned into their collection.

Except not everyone was happy with this. Back in June, OldGameMags and the Internet Archive were at a stand-off. OldGameMags proprietor Kiwi, discovered that some of their magazine scans were being uploaded to the Internet Archive and, surprisingly, this was a repeated offense. While the Internet Archive is a massive, well-funded and well-known archiving platform, OldGameMags is a small group who gathered everything they had with blood, sweat and tears. From a utilitarian point of view, the IA volunteer who was mirroring OGM might have found a home which could host these with a much lower burden and make them accessible for a cheaper cost. But on the other hand, if they continued mirroring everything on OGM, there would no longer be any reason for people to join OGM. This was their passion and from OGM’s point of view, it was as good as stealing (even if a donation had been made). Eventually the two sides were able to reach a settlement, but if IA and OGM had been able to better communicate their intention, such a conflict of interest might have been resolved a lot sooner.

Then finally, as it stands, we must acknowledge one another. We’re all in this preservation business together after all, even if our goals might differ. To reiterate, most of us don’t have states, companies or foundations subsidizing our efforts. So we ought to support one another, technically and financially.

This is why we need arbiters such as the DPC, SPN and IIPC to bring us together. We need to do what we can preserve the preservationists, because nobody else will! The best way to do that is to acknowledge one another, working synergistically and doing our best to not offend one another.


shallow focus photography of hourglass
Photo by Jordan Benton on Pexels.com

Preservation means preserving the past, yet we do it not for the past, but for the future. Then what right do we have to object so vehemently to get with the times? The world is changing, it’s high-time we caught up.

We need adaptability to achieve dynamism, to keep up with the rapid decay of digital information. We need acceptance to foster new ideas and train new archivists. And we need acknowledgement to protect and support one another.

So as we enter a new decade, may these words usher in a new era. A new era for a new breed of digital archiving and preservation…

]]>
https://datahorde.org/a-new-breed-of-digital-archiving-and-preservation/feed/ 0