We Just Rescued Thousands of Unpublished YouTube Captions

SCC

Community contributions were a feature on YouTube which allowed viewers to provide translations and captions for their favorite channels. Last year, YouTube realized that the feature had some problems and so began restricting it. And this year, believing the feature to be broken beyond salvation, they decided to axe it for good.

Unfortunately, in the process they were going to be getting rid of caption drafts, some of which were complete but stuck in review. So, Data Horde initiated a project to grab as many of these unpublished captions as possible, with a lot of assistance from Archive Team.

Although officially removed on September 28, we were able to continue accessing caption drafts for a whole month, until the endpoint was cut off at around 8 PM (UTz), October 28. In total, we scanned and pooled nearly 52 million items, including videos, channels, playlists, and mix playlists; for drafts. We also have two or three other bulky collections which were retrieved manually by archivists. In the coming days we will be working on organizing these drafts, with the hopes of giving them a collection on the Internet Archive.

We also have a few other ideas in mind for what to do with this massive collection of captions, so stay tuned these next couple of days to find out! In the mean time check out our YouTube Captioner’s Toolkit page for information on alternatives for the retired community captions feature.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *