So the annotation thing. You remember that, right? Well, here is how the worker seemed to function. Note that I’m getting this information from a brief cursory glance (and chatting with one of the devs). I know it works because I had three of them running at any given time. But how? Uh, *shrug*
Let’s get started, shall we? So the worker (at omarroth/archive) the code starts by creating a new Worker
class. This is our basic worker.
The run
function creates a BatchProcess
and calls its run
. *sigh* So what does that do? Well it asks the server for a batch, pulls it up from a database, and retrieves the annotations for each of them …which is done in yet another class, this one called AnnotationProcess
.
So what does AnnotationProcess
do? It does a request to YouTube to get the annotations. (The URL in the repository was changed after the fact. By me. Interesting.) How it gets those annotations is interesting: to make sure the worker is functioning properly, there is a trust system. A fresh worker won’t actually get a new batch; it’ll get one that’s already been verified. As it gives more valid responses, it’s more likely to get a new video. This way, the likelihood of getting garbage data is minimized, which is important for an archival project.
Once all the videos in a batch have been downloaded, they’re verified with the server and then uploaded to DigitalOcean Spaces, a cloud storage service. This goes on ad infinitum until YouTube decides to pull the plug.
And that is what (I think) the annotation worker did.
– glmdgrielson