web – Data Horde

Data Log 2023-10-06 Weekly News

themadprogramer — Fri, 06 Oct 2023 21:52:46 +0000

Discord CDN – FIFA delisting – Nintendo online shutdowns – Typepad – Goodboy Galaxy – Matrix

Jason Citron quote is from New York Times
Discord CDN filehosting changes rumor and announcement
EA/FIFA game delistings
Nintendo 3DS/Wii U online service shutdown
Typepad
Goodboy Galaxy Interview
Discord-Matrix Bridge: Out-of-your-element

BG: Jungle Waterfalls by Mark Ferrari.

Music: Meadow Breeze written by TECHNOTRAIN on Dova Syndome.

Data Log 2023-09-30 Intro to Phone Preservation

themadprogramer — Sun, 01 Oct 2023 21:09:44 +0000

What did you have before smartphones? Madpro and Donut talk about old phones, retro phones and retrofuturistic phones.

Podcast: https://feeds.acast.com/public/shows/data-log
Old Phone Preservation: https://twitter.com/OldPhonePreserv
More info on i-Mode: https://www.gamingalexandria.com/wp/2020/04/complete-guide-to-i-mode-games/
Emoji and the Unicode Consortium: https://home.unicode.org/emoji/about-emoji/
Snake on 7-Segment Displays, DIY project for the insane engineers among you: https://hackaday.io/project/166556-snake-on-7-segment-displays
7-segment by Vishnu Mohanan on Unsplash: https://unsplash.com/photos/9mSe-QS5JrA
3G by SoQ錫濛譙 on Flickr: https://www.flickr.com/photos/qiaomeng/5059237997
Broadcasting tower in Trondheim, Norway by on Wikimedia: https://en.wikipedia.org/wiki/Radio_broadcasting#/media/File:Tyholt_taarnet.jpg

Data Log 2023-09-17 Unity Platform Runtime Fee Controversy

themadprogramer — Sun, 17 Sep 2023 23:07:03 +0000

The Unity Engine is a popular 3D engine for making games and other interactive media. In this episode of Data Log glmdgrielson and madpro talk about how game designers and gamers are upset with the Unity platform’s new payment scheme.

David Helgason Interview: https://www.pocketgamer.biz/interview/58959/not-everyone-needs-to-be-king-or-supercell-unitys-david-helgason-on-unity-5-everyplay-and-unreal-engine/
Unity Pricing Changes Blogpost: https://blog.unity.com/news/plan-pricing-and-packaging-updates
Cult of the Lamb Steam Page Announcement: https://store.steampowered.com/news/app/1313140/view/7249162346641740420
The Chaos at Unity by Alex Heath: https://www.theverge.com/2023/9/15/23875408/command-line-the-chaos-at-unity
BlueMaxima’s Flashpoint (Home to some Web Unity Games): https://flashpointarchive.org/
Fundraiser for Donkey Kong Programmers Story: https://hitsave.org/fundraiser-translation-of-donkey-kong-programmers-story/

Data Log 2023-01-26 What is Archiving?

themadprogramer — Thu, 26 Jan 2023 23:37:35 +0000

The first ever episode of Data Log: The Archiver’s Favorite Podcast. Learn about what archiving is and how to join the archiving community!

Ruffle: https://ruffle.rs/
Public Domain Day 2023 Film Contest: https://blog.archive.org/2023/01/21/public-domain-day-film-contest-highlights-works-of-1927/
Internet Archive: https://archive.org/
Internet Archive Twitter: https://twitter.com/internetarchive
Internet Archive Mastodon: https://mastodon.archive.org/users/internetarchive
Jason Scott: https://twitter.com/textfiles
Center for MI Jewish Heritage: https://www.tiktok.com/@mijewishheritage

Twitter to Begin Purging Inactive Accounts

themadprogramer — Sat, 10 Dec 2022 17:01:41 +0000

Edit: May 8 2023, Elon Musk has announced that the account purge has come into effect, read on to learn more about what you can do if this affects you.

We’re purging accounts that have had no activity at all for several years, so you will probably see follower count drop
— Elon Musk (@elonmusk) May 8, 2023

Archived Snapshot: https://archive.md/PYJ2E

Yesterday, Twitter CEO Elon Musk announced a purge on inactive accounts. Musk has cited the reasoning as “freeing [up] the name space” for users who might want a new handle. Musk then went on to assure Twitter that these “1.5 billion accounts” would be accounts which have not tweeted or logged-in for the last few years.

Twitter will soon start freeing the name space of 1.5 billion accounts
— Elon Musk (@elonmusk) December 9, 2022

Archived Snapshot: https://archive.ph/9tEMm

This is very much, an expected move. You might recall from our blogpost a while ago that Musk had expressed interest in purging accounts early in November. It should also be noted that Twitter’s previous management had failed a similar policy change in 2019, to expire accounts which had not been logged into for 6 months. That policy change failed due to outrage and protest across the platform.

Definitely
— Elon Musk (@elonmusk) November 1, 2022

Archive Snapshot: https://archive.ph/hcKsV

Well then, should you be worried? Probably not for your own account. Some have expressed concern over the accounts of loved ones or deceased celebrities, which will more easily fit this criteria. If you have such concerns, we can recommend a useful utility from our friend JustAnotherArchivist, called snscrape. snscrape allows you to save public tweets from accounts on Twitter. It also works for a few other websites like Facebook and Reddit.

Simply install with: pip3 install snscrape

The code is available at https://github.com/JustAnotherArchivist/snscrape, but if you have Python 3 installed pip3 install snscrape is all you need to install it.

From your terminal or command prompt, the following command will save a local archive of Elon Musk’s tweets:

snscrape --jsonl twitter-user elonmusk > muskytweets.json

And if you want another account, just substitute the username and a file to save to:

snscrape --jsonl twitter-user RetroTechDreams > RTD_tweets.json

I might write a more detailed tutorial on snscrape if people are interested.

But for the time being, spread the word! Save what endangered accounts are valuable to you and be sure to tell all of your friends about snscrape.

Twitter in Trouble? Why you should Archive your Tweets

Twitter in Trouble? Why you should Archive your Tweets

themadprogramer — Mon, 05 Dec 2022 17:04:49 +0000

Twitter has seen some radical restructuring since Elon Musk’s acquisition over a month ago. Now is a good time as ever, that we talked about what options you have in archiving or preserving your Twitter content.

This new era of Twitter has been quite turbulent, to say the least. More than half of the workforce has been fired or has quit, and site functionality is becoming unstable, as reported by the Seattle Times. Mastodon has emerged as a serious Twitter alternative. In fact, some of those who have departed Twitter now have their own Mastodon instance over at macaw.social. Personally, I am excited about the rise of mastodon as an alternative as I have been posting Data Horde updates over at @[email protected] for about two years now.

So, why not leave Twitter behind and move on? Now, Twitter allows you to request a copy of your personal data: Tweets and all. But it’s probably hard to leave a site that you have been on for over a decade. Especially, when requesting your personal archive is not even working correctly. Many people have reported that archive requests are being ignored or processed with delay. On a test account, we at Data Horde found that it took over 3 days to receive a personal archive.

Tweeters complaining about being unable to export personal archives: view snapshot at archive.is

In 2022 this is a big deal, not only for archivists but also for legality. Article 13 of the GDPR mandates a responsibility to provide a copy of collected data to users (i.e. data subjects) upon request. Outside of Europe, California’s CCPA has a similar clause protecting the right to know.

There are repercussion for not respecting these rules. Recently another messaging app, Discord, was fined 800 000 Euros for failing to respect data retention periods and security of personal data by French Regulator CNIL. That was actually a reduced fine, given Discord’s conciliatory attitude. If Twitter does not up their game, they may meet a similar fate, if not a worser one.

Now that I have your attention, I would like to direct it to the help page on how to request a personal archive from Twitter: https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive . Even if a bit unstable, this is what you need to follow to save a copy of your Tweets.

The Twitter archive is big and burly but not perfect. Johan van der Knijff recently wrote a blogpost on some shortcomings, such as the t.co URL-shortener and some workarounds: https://www.bitsgalore.org/2022/11/20/how-to-preserve-your-personal-twitter-archive

Oh, and by the way. It gets worse: Elon Musk has also stated interest in purging inactive accounts and their Tweet history.

Definitely
— Elon Musk (@elonmusk) November 1, 2022

Archive Snapshot: https://archive.ph/hcKsV

This might not seem like a big deal, except to the one or two of our readers who periodically scrape politician accounts off of https://ballotpedia.org. Yet it is actually a serious turning point. Currently, Twitter does not purge inactive accounts, except in the event of death or incapacitation and by special request.

In 2019 there was an attempted Twitter policy change to expire accounts which had not been logged into for 6 months. This sparked outrage across the platform by those who saw this as unfair to the memory of inactive accounts. In particular, fans of deceased K-Pop artist Kim Jong-hyun, otherwise known as Jonghyun (김종현/종현) came to the defence of his legacy overturning the attempt altogether. Turning back on this decision would go against all of that heritage, people’s heritage, Twitter’s heritage, web heritage. Alas this the projected course of things, even if we cannot prevent it, it is perhaps our duty to protest why it is wrong.

What about the extreme scenario of a total collapse of Twitter? What does that mean for web history? Well, the good new is that people have been thinking on this for much longer than before this year.

Already in 2010 the Library of Congress announced that they would be copying the entire internal archive of Twitter, starting from March 2006.

Library to acquire ENTIRE Twitter archive — ALL public tweets, ever, since March 2006! Details to follow.
— Library of Congress (@librarycongress) April 14, 2010

Archive Snapshot: https://web.archive.org/web/20161208074132/https://twitter.com/librarycongress/statuses/12169442690

There are also many smaller grabs on the Internet Archive and archive.today, some of which you have seen linked above. Special mention goes to Archive Team‘s periodical Twitter Stream archive.

Last but not least, you can help! The Internet Archive is collecting Tweet dumps from people as we speak: https://archive.org/services/wayback-gsheets/archive-your-tweets Whether you just want extra insurance for your back-up, or to contribute to the wealth of the web you can help by using the above tool to upload your Tweets to the Internet Archive for generations to come.

Action Script 3 now supported in the Ruffle Emulator

themadprogramer — Tue, 30 Aug 2022 01:00:35 +0000

Flash, once the web’s sweetheart in games and animation, has today fallen into obscurity. Since the end-of-life two years ago, Flash media has become virtually unplayable. But things are changing with emulators like Ruffle.

Not all Flash media is the same. You see, the interactivity in Flash relies on a language called ActionScript. In 2006, ActionScript 3 came out with new features. Alas it was backwards-incompatible with AS2 and so not everyone was to keen on it. It wasn’t until the release of Flash Professional CC (2013), that authors were required to switch to AS3.

This has made Flash emulation quite a challenge. Understandably, Flash emulators have had to choose between prioritising AS2 and 3. For example the emulator Shumway focussed on AS2 (with some AS3 support) and Lightspark focussed on AS3. Unfortunately, Shumway hasn’t been updated in ages and Lightspark isn’t browser-based.

On the other hand, Ruffle, a relative newcomer to the Flash emulation scene has been picking up speed. Written in Rust and sporting WebAssembly, it runs wicked fast and cross-platform! Though Rust’s focus has too been mostly in one direction, namely AS2, they have started to make progress on AS3 well. Below is an abridged version of an announcement shared on the Ruffle Discord Server by our friend Nosamu.

Ruffle Compatibility Coverage

The first few ActionScript 3 games are finally playable in Ruffle, Demos below! One of the first fully-playable games is Not To Scale, a simple but clever photo puzzle! You can try it out right now on Newgrounds: https://www.newgrounds.com/portal/view/575849/format/flash?emulate=flash

Even more exciting, the beautiful minigolf game Wonderputt is now mostly playable with the Ruffle desktop app! The first hole is quite tricky due to collision bugs, but a fix is in progress, along with web performance improvements.

Watch our #announcements channel for updates in the coming weeks! As always, you can download Ruffle from https://ruffle.rs/#downloads.

But wait, there’s more – Ruffle web builds now have a fancy loading animation! If you own a website, now is the perfect time to update Ruffle! No longer will your visitors be greeted with a blank white screen while waiting for Ruffle to load. Check out the animation:

Also, if you’d like to add your own flair to the loading screen or disable it altogether, there are a few customization options: preloader, --preloader-background, and --logo-display. For more information, see our wiki.

And finally, we’re looking for help developing an official Ruffle app for Android! If you have experience with Rust development targeting Android, please check out @szőlő’s WIP repository: https://github.com/torokati44/ruffle-android and join the development thread: Native Android App.

Do you have a favorite Flash game you just wish you could play right now? It’s not emulation, but Bluemaxima’s Flashpoint collection might be able to run what you are looking for right now. Be sure to also check out our Flash Player Emergency kit for more tips on Flash after its end-of-life.

The Flash Player Emergency Kit

Archive95: The Old Man Internet

glmdgrielson — Thu, 21 Jul 2022 23:28:13 +0000

The internet is kind of old. To be fair, so is the Internet Archive and its Wayback Machine. But IA isn’t older than the internet (how could it be?) so there are some things that could slip through the cracks. Things before its founding in 1996, for example.

Then comes along Archive95 which is an archive of the pre-IA internet of 1995. It primarily uses two sources, the World Wide Web Directory and the German language Einblick ins Internet, to give an impression of an era when the web was small and monitors were bulky as heck.

– glmdgrielson, a young whippersnapper

Remembering YouTube’s Lost Unlisted Videos

themadprogramer — Thu, 12 May 2022 22:55:50 +0000

Melinda teaches high school in the Bay Area and recently reached out to us with a problem. Her students just finished a video history project that she wanted to share with their parents and classmates. But she was concerned about posting the videos publicly because she didn’t want the whole world to find them (frankly, neither did her students). Melinda told us YouTube’s private sharing options — a 25-person cap that’s limited to other YouTube users — didn’t work for her. She needed a better option to privately share her students’ talent.
Later today, we’ll be rolling out a new choice that will help Melinda and other people like her: unlisted videos.
Jen Chen, Software Engineer at Google, https://blog.youtube/news-and-events/more-choice-for-users-unlisted-videos/

On this day, 12 years ago, YouTube introduced unlisted videos as a compromise between a public and a private video. Perfect for sharing your history project with friends, video outtakes, or just about anything you didn’t want cluttering your channel.

Some time later, a non-targetted exploit was discovered which could reveal the links of existing YouTube videos, but not the content itself. So in 2017, YouTube changed how links were generated to make links more unpredictable. It could have ended there, but it didn’t.

YouTube will Private Old Unlisted Videos Next Month

Years later in 2021, YouTube decided that having their links be hypothetically predictable, might be problematic for old unlisted videos. So they decided to haphazardly automatically private old unlisted videos, uploaded prior to 2017.

Users were offered an option to opt-out, if their channels were still active AND they acted within a month of the announcement. Unfortunately millions of videos were lost in the name of security. Vlogs, school projects, outtakes, patreon videos; things people wanted to share BUT they didn’t private.

Is there any silver lining to all of this? Not all is lost. There are collections like filmot which offer a non-invasive database of metadata on these unlisted videos, minus the videos themselves. There was also a project by Archive Team to archive a few TBs of unlisted videos, even if only a small sample. More than anything, YouTubers have been uploading re-uploads, in the case of inactive channels and/or significant unlisted videos.

Not to sound like a beggar, but we would really appreciate it if you could share this short blog post. Almost one year later this situation has still not become common knowledge. Also be sure to check out our unlisted video countdown from last year:

2/5 Thank you for sticking with us these past 30 days, here's a complete playlist if you'd like to watch all the videos on YouTube while you still can.

https://t.co/pXyzDl9BfD

Note that you should go to https://t.co/BLlMJfP9gb for videos with annotations.
— Data Horde (@DataHordeBlog) July 22, 2021

Stuck in the Desert, or Video Strike Team

glmdgrielson — Mon, 28 Feb 2022 17:22:35 +0000

This is an interview with Sokar, of the Video Strike Team, conducted over IRC. The VST is an archival group of a rather small scope: preserving a particular stream, Desert Bus For Hope.

Desert Bus For Hope is a yearly charity stream, running under the premise that the more money that is received, the longer the stream goes on for, and the more the organizers have to play the dullest video game imaginable. So dull, in fact, that Desert Bus has never been officially released, actually. This year’s fundraiser gave us a stream that is just exactly an hour under one week: 6 days and 23 hours! So this was a very long stream with a lot of data to preserve. So follows the story of how that happens.

Note: DBx refers to the iteration of Desert Bus for Hope. For example, this year, 2021, was DB15. Also, I have only minimally modified our interview, by adding in links where applicable and making minor spelling corrections.

glmdgrielson: So first off, outside of the VST, what are you up to?

Sokar: I do video editing and Linux server security / software support, and various other (computer related) consulting things for “real work”.

g: So you started off with just the poster for DB6, according to the site, correct? How did that work?

S: We didn’t actually start doing the interactive postermaps till DB8, then I worked backwards to do all the previous ones (still not done).
The VST itself started formally during DB6.

g: That’s when Graham contacted MasterGunner, who presumably contacted you, correct?

S: Tracking the run live in some way was a confluence of ideas between me, Lady, and other members of the chat at the time, Graham knew how to get ahold of Gunner about making live edits because he was one of the people who helped with the DB5 torrent.
I honestly don’t remember how most of the DB6 VST crew was put together, it was very last minute.

g: Do you know anything about how that torrent was made?

S: The first DB5 torrent?

g: Yes.

S: Kroze (one of the chat mods) was physically at DB5 and brought a blank external HDD with him specifically for recording the entire stream, then after the run Fugi and dave_random worked together to create the torrect (with all the files split into 15min chunks) I wanna say the torrent file was initially distributed via Fugi’s server.
DB5 was the first time the entire run was successfully recorded.
LRR had previously toyed with the idea (DB3, but ended up doing clips instead) and steamcastle attempted to record all of DB4 but was unsuccessful.

g: And DB6 was the first year the VST existed. What was that first year like?

S: The first year was VERY short handed, we only had 14 people, a LOT of the “night” shifts were either just me by myself or me and BillTheCat
We really didn’t know what we were doing, the first rendition of the DB6 sheet didn’t even have end times for events.
There was just “Start Time” “Event Type” “Description” and “Video Link”.
At some point we (the VST) will just re-spreadsheet the entire run, because we were so short handed we missed a lot of things, when I went back to make the DB6 postermap I think I ended up uploading ~17(ish) new videos because that was how many posterevents weren’t even on the sheet.

g: What sort of equipment or software did you use back then?

S: We used google sheets (and still do, but not in the same way anymore), and then all the “editing” was done via Twitch’s Highlight system at the time, which then had a checkbox to auto upload the video to youtube.
Then there were a few people with youtube access that could enable monetization and other things like that.
Twitch’s Highlight editor (especially at the time we used it (DB6/DB7)) was extremely painful to use on very long VODs, there was no “seek by time”. You had to use the slider and kinda position it where you wanted and then just wait and be quick on the cut button.
We didn’t actually start capturing the run ourselves until Twitch’s overzealous VOD muting happened ( 2014-08-06 ) and we had to figure out a new way of doing things.

g: And just two years down the line, you had to start making your own tools. What was that like?

S: When that happened we had roughly 3 months to figure out what to do. dave_random put in a ton of time figuring out how to capture the run (using livestreamer which has since been forked to streamlink). The way it worked during DB8 was that the video would get uploaded to youtube with a couple of minutes on either side of the video, then the video editors would go in and edit the video using youtube’s editor.
Then we found out that there is a limit tied to youtube’s editor and you can only have a set number of videos “editing” at once, then you get locked out of the editor for a while, we (the VST and DesertBus in general) always end up being en edge case.
MasterGunner wrote the first version of our own editor so we could edit the video before it got sent to youtube.
The VST website itself also didn’t exist till DB9, a lot of the poster revisions archive only exists because J and myself kept copies of all the revisions.

g: After DB9 is when you started trying to backup the previous years, right?

S: Yea, so (internally) the VST had talked about archival problems over the years, and when Anubis169 went to DB9 (in person) to volunteer, he also went with the express purpose to grab as many of the Desert Bus files as he could find at the time.
When he got back home he and I went over the files he managed to get and he sent me a copy of everything he grabbed, I also spent the time trying to figure out how uStream had stored all the DB1 and DB2 clips then downloaded a copy of all of them.
It turned out to be a very good time to do that, since for a few years later IBM bought uStream and deleted all archives

g: So that looks to be all of the history questions I have. Now for the fun part: describe the process of archiving a Bus.

S: As in as it currently stands?
As in “how did this year work”?

g: Yes. How would the process of archival go as it currently stands?

S: well, that’s a hard one, haha

g: Not surprised, given the scope of the event we’re dealing with.

S: For old stuff: I already (also) flew to Victoria to get the missing DB3 and DB4 files, which was successful, the next time I go it will be to recover old prize data (I’m in the process or making a full prize archive)
For what we “regularly” capture setting up for a new run goes pretty much like this:
The current version of the wubloader (our capture architecture) (re-written by ekimekim, and chrusher after DB12) is used by ekim all year, so he reguarly workes on it and fixes it to work around anything twitch changes.
~3 months before the run we will put out the signup form to the internal VST place, a week or so after that it will be the IRC channel, and the LRR discord (in the desertbus channel)
During about 2 of those 3 months I’ll finish up any new stuff for the VST website I’m working on, so they are ready for the run.
The VST Org. Committee has meetings during the year to talk about any changes we want to make to any of the internal tools of our external facing stuff, the first of which usually happens in June for a new run.
Sorry, some of this is out of order.

g: You’re fine.

S: If we need to inform regular VST members of some major changes we’ve made we schedule meetings over some form of video chat for them to signup for and then to do a quick check over on everything new so we can get any questions answered and have everyone on the same page (usually about 30min per-session).
New people will get a separate training session that’s usually about 90-120 min in length, new people will always start off as “spreadsheeters”, we don’t rotate in new editors until they’ve been around for a couple years and they kind of have a feel for what we do.
For setting up the VST website for the run, there’s a separate “front page” for when the run is live, and also the head node is dropped back to being non-public and we stand up a 8-node globally located DNS cluster to handle the load, it runs on a 5 minute update cycle because late-run when there is a new poster revision a full update and sync takes about 3 & 1/2 minutes.
For setting up a “new year” on the VST site, there’s an amount of manual work, but it’s only about 3 hours or so, really depends on how many of the other things we track are setup at that point.

g: Other stuff being things like the charts, the clock, chat stats?

S: The clock is pretty easy, the chat stats require the chat capture be enabled and going, the graphs require that the donation capture is going already, so that can’t be setup till donations re-set, the gamejam page can’t be setup till Famout gets the gamejam on itch.io setup, the gameshows page can’t be setup till Noy2222 actually knows what gameshows he’s doing this year. The spreadsheet page can’t be setup until all the google docs spreadsheets are setupThe posters page requires that Lunsford has the poster that they’re drawing be setup somewhere for us to query. And the animated poster evolution page requires 3 poster revisions before that works at all. The postermap page is updated manually when I have time to draw/trace and then import the new postermap(ImageMap) of the poster Lunsford has drawn (still not done with this year’s yet)
For standing up our capture infastructure: There’s at minimum 2 nodes on “hardware” as in non-virtualized, that are “editing” nodes, only one of which actually uploads to the youtube channel, after that (usually) all the other nodes are virtualized and (this year) were provided by 6 different people, these are completely separate from the VST website nodes.
We also always try to make sure all the capture nodes are geographically distributed so a random network outage can’t hurt us, and so if one node misses a segment the other 7 can fill in the blank.
Once all of those are stood up and working, they’re all imported into the monitoring dashboard so we know if one of them has a problem. Usually we have all the capture (and website) hardware stood up about 1 week before the run starts. Then we have time to test it and ekimekim and chrusher (Wubloader), ElementalAlchemist (who coded the new version of thrimbletrimmer, our editor), and myself (website) have time to fix any bugs / finish any new features. At that point all the approved (new and old) VST members will also get an invite to the private sheet. Also, we invite any new VST members to the private chat space we use during the run (self-hosted Zulip).

We also spend a lot of time working on the schedule (as part of the signup form people tell us their available hours), people are limited to a max of 6 hour shifts, so scheduling ~60 people over a week where we try to maintain ~8 active people on the private spreadsheet is actually quite complex. ekimekim created a python script to create an initial rough guess, we then have a VST Org meeting to smooth things out. The resulting (schedule) spreadsheet is then given to everyone on the VST so they can check for errors in their personal schedule, and then (for during the run) the schedule’s csv is fed in to a zulip bot that announces who’s going on/off shift. Also, once I have the VST website nodes setup I give J access to one (geographically) near him, that he also uses for his own capture of the chat, twitch, and poster revisions, that way if the VST website head-node misses something we have a backup copy with the stuff J sets up as well.
I think that’s it, everything I’m thinking of now is post-run stuff. Oh, J also runs a capture of all of the Prize data that we preserve for the (upcoming) prize archive.

g: Well, that’s one heck of a process. Mind going into the tech used, like Wubloader and thrimbletrimmer?

S: Sure, wubloader is a ekimekim/chrusher coded Python3 project that is a custom HLS capture (as in we capture every 2-second long .ts segment twitch sends out when the stream is going). It uses PostgreSQL for backend databases, nginx for web, FFMPEG for doing the actual video editing, and docker for easier node deployment. It uses the GoogleDocs API for interaction with the private sheet and the YouTube API for uploading to youtube / managing the playlists.
Thrimbletrimmer (Now coded by ElementalAlchemist) uses HLS.js and a bunch of custom javascript and html for the editing interface, it can make multiple cuts (so we can cut the middle out of a video) and has the ability to add the chapter markers to the description if we want to do that on a longer video.

g: So the upload process is done by Thrimbletrimmer?

S: When someone makes an edit in Thrimbletrimmer, it talks to thrimshim (that then passes the actual edits on to the wubloader that then does the edit and uploads the video to youtube.
thrimshim is a piece of the wubloader that is kind of like an API to all the data in wubloader
so when a video is marked in the private sheet for upload there is a link to thrimbletrimmer that has a UUID on it, that thrimbletrimmer passes to thrimshim so it knows which video segments correspond to the requested video. On the way back it’s like “edit this uuid with the following edits, here’s the video title and description”

g: So what about the Twitch chat? How do you grab that?

S: Twitch chat is captured in 2 ways: via irssi (unix command line IRC client) both J and myself run a capture using that, and (this year) ekimekim coded up a capture for it that also captures all the meta-data for each chat message.
So before the run starts, J and I just setup our irssi sessions on 2 respective servers, and just leave them running in screen. ekimekim runs his custom capture off 2 of the wubloader nodes

g: So how has this setup evolved over time?

S: For chat capture or video capture?

g: Both.

S: Chat capture has largely been the same, old (pre-DB6) chat capture was just done with whoever made the capture’s IRC program (mIRC or IceChat).
Video capture has changed quite a bit, the first version of the wubloader (DB8) [coded by dave_random] was done with livestreamer (saved to mp4 files) and only did rough cuts, the 2nd version (DB9-12) came with Thrimbletrimmer (coded by MasterGunner) which did specific cuts, but also still used livestreamer as the capture source, During DB12 we discovered Twitch had implemented a “24-hour watch limit” which caused both capture nodes to miss part of Ash & Alex’s driver intro. Starting with DB13 ekimekim and Chrusher implemented a custom home-grown capture method that attaches directly to the HLS stream, and resets itself every so often to avoid the 24 hour watch limit.
The new capture metod saves all the 2-second long .ts files as they come out and each node fills in for any other node that got a partial or missed segment, now the capture nodes are a cluster instead of independent.
The editing process has gone from using twitch highlights -> using youtube’s editor -> using a custom editor coded by MasterGunner -> using a further improved editor coded by ElementalAlchemist.
Compared to using twitch or youtube’s editor the ones coded by MasterGunner and ElementalAlchemist are an amazing improvement, and much less buggy.

g: Anything else you want to add? Advice for somebody considering a similar archival project? Other than “don’t”?

S: Honestly: “Start on the first year of the event”, “Ask us (the VST) for advice”, “Preserve everything, backtracking to get something you missed is always more painful”
“Don’t try to do it by yourself”
The VST only works because of all the people involved and learning from the mistakes we’ve made over the years.

g: Any closing thoughts before I wrap up this interview?

S: All of this would never have happened if LoadingReadyRun wouldn’t have put “First Annual” on the website banner back in 2007 as a joke.

g: Thank you for your time!

– glmdgrielson, along for the eight hour, mind-numbingly dull drive