This is an interview with Sokar, of the Video Strike Team, conducted over IRC. The VST is an archival group of a rather small scope: preserving a particular stream, Desert Bus For Hope.
Desert Bus For Hope is a yearly charity stream, running under the premise that the more money that is received, the longer the stream goes on for, and the more the organizers have to play the dullest video game imaginable. So dull, in fact, that Desert Bus has never been officially released, actually. This year’s fundraiser gave us a stream that is just exactly an hour under one week: 6 days and 23 hours! So this was a very long stream with a lot of data to preserve. So follows the story of how that happens.
Note: DBx refers to the iteration of Desert Bus for Hope. For example, this year, 2021, was DB15. Also, I have only minimally modified our interview, by adding in links where applicable and making minor spelling corrections.
glmdgrielson: So first off, outside of the VST, what are you up to?
Sokar: I do video editing and Linux server security / software support, and various other (computer related) consulting things for “real work”.
g: So you started off with just the poster for DB6, according to the site, correct? How did that work?
S: We didn’t actually start doing the interactive postermaps till DB8, then I worked backwards to do all the previous ones (still not done).
The VST itself started formally during DB6.
g: That’s when Graham contacted MasterGunner, who presumably contacted you, correct?
S: Tracking the run live in some way was a confluence of ideas between me, Lady, and other members of the chat at the time, Graham knew how to get ahold of Gunner about making live edits because he was one of the people who helped with the DB5 torrent.
I honestly don’t remember how most of the DB6 VST crew was put together, it was very last minute.
g: Do you know anything about how that torrent was made?
S: The first DB5 torrent?
S: Kroze (one of the chat mods) was physically at DB5 and brought a blank external HDD with him specifically for recording the entire stream, then after the run Fugi and dave_random worked together to create the torrect (with all the files split into 15min chunks) I wanna say the torrent file was initially distributed via Fugi’s server.
DB5 was the first time the entire run was successfully recorded.
LRR had previously toyed with the idea (DB3, but ended up doing clips instead) and steamcastle attempted to record all of DB4 but was unsuccessful.
g: And DB6 was the first year the VST existed. What was that first year like?
S: The first year was VERY short handed, we only had 14 people, a LOT of the “night” shifts were either just me by myself or me and BillTheCat
We really didn’t know what we were doing, the first rendition of the DB6 sheet didn’t even have end times for events.
There was just “Start Time” “Event Type” “Description” and “Video Link”.
At some point we (the VST) will just re-spreadsheet the entire run, because we were so short handed we missed a lot of things, when I went back to make the DB6 postermap I think I ended up uploading ~17(ish) new videos because that was how many posterevents weren’t even on the sheet.
g: What sort of equipment or software did you use back then?
S: We used google sheets (and still do, but not in the same way anymore), and then all the “editing” was done via Twitch’s Highlight system at the time, which then had a checkbox to auto upload the video to youtube.
Then there were a few people with youtube access that could enable monetization and other things like that.
Twitch’s Highlight editor (especially at the time we used it (DB6/DB7)) was extremely painful to use on very long VODs, there was no “seek by time”. You had to use the slider and kinda position it where you wanted and then just wait and be quick on the cut button.
We didn’t actually start capturing the run ourselves until Twitch’s overzealous VOD muting happened ( 2014-08-06 ) and we had to figure out a new way of doing things.
g: And just two years down the line, you had to start making your own tools. What was that like?
S: When that happened we had roughly 3 months to figure out what to do. dave_random put in a ton of time figuring out how to capture the run (using livestreamer which has since been forked to streamlink). The way it worked during DB8 was that the video would get uploaded to youtube with a couple of minutes on either side of the video, then the video editors would go in and edit the video using youtube’s editor.
Then we found out that there is a limit tied to youtube’s editor and you can only have a set number of videos “editing” at once, then you get locked out of the editor for a while, we (the VST and DesertBus in general) always end up being en edge case.
MasterGunner wrote the first version of our own editor so we could edit the video before it got sent to youtube.
The VST website itself also didn’t exist till DB9, a lot of the poster revisions archive only exists because J and myself kept copies of all the revisions.
g: After DB9 is when you started trying to backup the previous years, right?
S: Yea, so (internally) the VST had talked about archival problems over the years, and when Anubis169 went to DB9 (in person) to volunteer, he also went with the express purpose to grab as many of the Desert Bus files as he could find at the time.
When he got back home he and I went over the files he managed to get and he sent me a copy of everything he grabbed, I also spent the time trying to figure out how uStream had stored all the DB1 and DB2 clips then downloaded a copy of all of them.
It turned out to be a very good time to do that, since for a few years later IBM bought uStream and deleted all archives
g: So that looks to be all of the history questions I have. Now for the fun part: describe the process of archiving a Bus.
S: As in as it currently stands?
As in “how did this year work”?
g: Yes. How would the process of archival go as it currently stands?
S: well, that’s a hard one, haha
g: Not surprised, given the scope of the event we’re dealing with.
S: For old stuff: I already (also) flew to Victoria to get the missing DB3 and DB4 files, which was successful, the next time I go it will be to recover old prize data (I’m in the process or making a full prize archive)
For what we “regularly” capture setting up for a new run goes pretty much like this:
The current version of the wubloader (our capture architecture) (re-written by ekimekim, and chrusher after DB12) is used by ekim all year, so he reguarly workes on it and fixes it to work around anything twitch changes.
~3 months before the run we will put out the signup form to the internal VST place, a week or so after that it will be the IRC channel, and the LRR discord (in the desertbus channel)
During about 2 of those 3 months I’ll finish up any new stuff for the VST website I’m working on, so they are ready for the run.
The VST Org. Committee has meetings during the year to talk about any changes we want to make to any of the internal tools of our external facing stuff, the first of which usually happens in June for a new run.
Sorry, some of this is out of order.
g: You’re fine.
S: If we need to inform regular VST members of some major changes we’ve made we schedule meetings over some form of video chat for them to signup for and then to do a quick check over on everything new so we can get any questions answered and have everyone on the same page (usually about 30min per-session).
New people will get a separate training session that’s usually about 90-120 min in length, new people will always start off as “spreadsheeters”, we don’t rotate in new editors until they’ve been around for a couple years and they kind of have a feel for what we do.
For setting up the VST website for the run, there’s a separate “front page” for when the run is live, and also the head node is dropped back to being non-public and we stand up a 8-node globally located DNS cluster to handle the load, it runs on a 5 minute update cycle because late-run when there is a new poster revision a full update and sync takes about 3 & 1/2 minutes.
For setting up a “new year” on the VST site, there’s an amount of manual work, but it’s only about 3 hours or so, really depends on how many of the other things we track are setup at that point.
g: Other stuff being things like the charts, the clock, chat stats?
S: The clock is pretty easy, the chat stats require the chat capture be enabled and going, the graphs require that the donation capture is going already, so that can’t be setup till donations re-set, the gamejam page can’t be setup till Famout gets the gamejam on itch.io setup, the gameshows page can’t be setup till Noy2222 actually knows what gameshows he’s doing this year. The spreadsheet page can’t be setup until all the google docs spreadsheets are setupThe posters page requires that Lunsford has the poster that they’re drawing be setup somewhere for us to query. And the animated poster evolution page requires 3 poster revisions before that works at all. The postermap page is updated manually when I have time to draw/trace and then import the new postermap(ImageMap) of the poster Lunsford has drawn (still not done with this year’s yet)
For standing up our capture infastructure: There’s at minimum 2 nodes on “hardware” as in non-virtualized, that are “editing” nodes, only one of which actually uploads to the youtube channel, after that (usually) all the other nodes are virtualized and (this year) were provided by 6 different people, these are completely separate from the VST website nodes.
We also always try to make sure all the capture nodes are geographically distributed so a random network outage can’t hurt us, and so if one node misses a segment the other 7 can fill in the blank.
Once all of those are stood up and working, they’re all imported into the monitoring dashboard so we know if one of them has a problem. Usually we have all the capture (and website) hardware stood up about 1 week before the run starts. Then we have time to test it and ekimekim and chrusher (Wubloader), ElementalAlchemist (who coded the new version of thrimbletrimmer, our editor), and myself (website) have time to fix any bugs / finish any new features. At that point all the approved (new and old) VST members will also get an invite to the private sheet. Also, we invite any new VST members to the private chat space we use during the run (self-hosted Zulip).
We also spend a lot of time working on the schedule (as part of the signup form people tell us their available hours), people are limited to a max of 6 hour shifts, so scheduling ~60 people over a week where we try to maintain ~8 active people on the private spreadsheet is actually quite complex. ekimekim created a python script to create an initial rough guess, we then have a VST Org meeting to smooth things out. The resulting (schedule) spreadsheet is then given to everyone on the VST so they can check for errors in their personal schedule, and then (for during the run) the schedule’s csv is fed in to a zulip bot that announces who’s going on/off shift. Also, once I have the VST website nodes setup I give J access to one (geographically) near him, that he also uses for his own capture of the chat, twitch, and poster revisions, that way if the VST website head-node misses something we have a backup copy with the stuff J sets up as well.
I think that’s it, everything I’m thinking of now is post-run stuff. Oh, J also runs a capture of all of the Prize data that we preserve for the (upcoming) prize archive.
g: Well, that’s one heck of a process. Mind going into the tech used, like Wubloader and thrimbletrimmer?
S: Sure, wubloader is a ekimekim/chrusher coded Python3 project that is a custom HLS capture (as in we capture every 2-second long .ts segment twitch sends out when the stream is going). It uses PostgreSQL for backend databases, nginx for web, FFMPEG for doing the actual video editing, and docker for easier node deployment. It uses the GoogleDocs API for interaction with the private sheet and the YouTube API for uploading to youtube / managing the playlists.
g: So the upload process is done by Thrimbletrimmer?
S: When someone makes an edit in Thrimbletrimmer, it talks to thrimshim (that then passes the actual edits on to the wubloader that then does the edit and uploads the video to youtube.
thrimshim is a piece of the wubloader that is kind of like an API to all the data in wubloader
so when a video is marked in the private sheet for upload there is a link to thrimbletrimmer that has a UUID on it, that thrimbletrimmer passes to thrimshim so it knows which video segments correspond to the requested video. On the way back it’s like “edit this uuid with the following edits, here’s the video title and description”
g: So what about the Twitch chat? How do you grab that?
S: Twitch chat is captured in 2 ways: via irssi (unix command line IRC client) both J and myself run a capture using that, and (this year) ekimekim coded up a capture for it that also captures all the meta-data for each chat message.
So before the run starts, J and I just setup our irssi sessions on 2 respective servers, and just leave them running in screen. ekimekim runs his custom capture off 2 of the wubloader nodes
g: So how has this setup evolved over time?
S: For chat capture or video capture?
S: Chat capture has largely been the same, old (pre-DB6) chat capture was just done with whoever made the capture’s IRC program (mIRC or IceChat).
Video capture has changed quite a bit, the first version of the wubloader (DB8) [coded by dave_random] was done with livestreamer (saved to mp4 files) and only did rough cuts, the 2nd version (DB9-12) came with Thrimbletrimmer (coded by MasterGunner) which did specific cuts, but also still used livestreamer as the capture source, During DB12 we discovered Twitch had implemented a “24-hour watch limit” which caused both capture nodes to miss part of Ash & Alex’s driver intro. Starting with DB13 ekimekim and Chrusher implemented a custom home-grown capture method that attaches directly to the HLS stream, and resets itself every so often to avoid the 24 hour watch limit.
The new capture metod saves all the 2-second long .ts files as they come out and each node fills in for any other node that got a partial or missed segment, now the capture nodes are a cluster instead of independent.
The editing process has gone from using twitch highlights -> using youtube’s editor -> using a custom editor coded by MasterGunner -> using a further improved editor coded by ElementalAlchemist.
Compared to using twitch or youtube’s editor the ones coded by MasterGunner and ElementalAlchemist are an amazing improvement, and much less buggy.
g: Anything else you want to add? Advice for somebody considering a similar archival project? Other than “don’t”?
S: Honestly: “Start on the first year of the event”, “Ask us (the VST) for advice”, “Preserve everything, backtracking to get something you missed is always more painful”
“Don’t try to do it by yourself”
The VST only works because of all the people involved and learning from the mistakes we’ve made over the years.
g: Any closing thoughts before I wrap up this interview?
S: All of this would never have happened if LoadingReadyRun wouldn’t have put “First Annual” on the website banner back in 2007 as a joke.
g: Thank you for your time!
– glmdgrielson, along for the eight hour, mind-numbingly dull drive