captions – Data Horde https://datahorde.org Join the Horde! Fri, 05 Mar 2021 22:23:03 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png captions – Data Horde https://datahorde.org 32 32 YouTube Community Contributions Archive Now Available: A Look at the Stats https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/ https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/#respond Fri, 05 Mar 2021 22:22:55 +0000 https://datahorde.org/?p=2091 The YouTube Community Contributions Archive is now available on the Internet Archive! You can download the entire collection, or simply search for and download files for a particular video. The collection is composed of 4096 ZIP archives which contain 406,394 folders and 1,361,998 files. Compressed, the collection is 3.83GB, and once decompressed, the collection is 9.46GB.

YouTube Community Contributions allowed users to create and translate closed captions/subtitles, titles, and descriptions of YouTube videos uploaded by channels who enabled the feature. Users could optionally choose to be credited for their captioning contributions.

While over 50 million videos were scanned for community contributions data, community contributions data was found for only 406,394 videos, indicating that the feature was used on only a small portion of the videos on YouTube. Some videos had YouTube Community Contributions enabled, but only had captions or metadata that was provided by the uploader. This accounted for 198,609 videos, meaning that 207,785 videos in the collection had community-contributed captions or metadata, further indicating that few videos on YouTube received community contributions. This means that approximately 0.4% of the videos that were scanned while creating this archive had community-contributed captions or metadata. This was likely because the community contributions feature was hard to discover in the YouTube interface, which limited the number of people who were aware of the feature.

Breaking down these numbers further, 80,746 videos had community-contributed draft metadata, 127,164 had community-contributed draft captions, 38,440 videos had community-contributed published metadata, 93,499 videos had community-contributed published captions, 179,366 videos had uploader-provided published metadata, and 225,466 videos had uploader-provided published captions.

YouTube Community Contributions allowed those who contributed captions to optionally be credited for their published work. 38,939 videos had credits for published captions created by the community. While captioning credits became inaccessible two weeks before the rest of the community contributions data became inaccessible, the number of videos that had captioning credits was still a considerably low number. It is estimated that, had the credits remained accessible until the rest of the community contributions feature was made inaccessible, about 80 thousand videos would have been found to have had credits.

The community contributions feature supported 196 languages, though not all languages were used equally. Below is a chart of the 25 most popular supported languages, and the number of videos that contain at least 1 file for each language (graphing all of the languages did not display well). This chart includes uploader-provided content.

When the the query excludes the uploader-provided content, we see significant shifts in the 25 most popular supported languages.

This shift indicates that community-contributions were often used to translate content.

A look at the language distribution of the collected metadata, including uploader-provided metadata, appears to be similar to the distribution of languages in the overall collection.

A look at the just the community-provided metadata provides a slightly different distribution of data.

The distribution of captioning languages, including uploader-provided captions, is similar to the collection overall.

The distribution of captioning languages, excluding uploader-provided captions, also resembles the overall collection.

It is also interesting to look at the distribution of the draft community captions and metadata that were collected in comparison to the published community captions and metadata.

The published community contributions data appears to be more evenly distributed across languages compared to the draft community contributions data.

Some users contributed many captions and were credited for their work on many videos. In total, 83,563 channels appeared in our credits collection. On average, a channel was credited on 1.47 caption tracks. 55 channels were credited for more than 50 caption tracks, and 14 channels were credited for more than 100 caption tracks! The top three channels which were credited on the most caption tracks in our collection created 255, 522, and 912 caption tracks, respectively.

Thank you to everyone who contributed to this project! Additional details about the collection itself are available in the Internet Archive item description. If you have any additional questions, please feel free to join the project Discord server!

]]>
https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/feed/ 0
We Just Rescued Thousands of Unpublished YouTube Captions https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/ https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/#respond Fri, 30 Oct 2020 21:33:41 +0000 https://datahorde.org/?p=1690 Community contributions were a feature on YouTube which allowed viewers to provide translations and captions for their favorite channels. Last year, YouTube realized that the feature had some problems and so began restricting it. And this year, believing the feature to be broken beyond salvation, they decided to axe it for good.

Unfortunately, in the process they were going to be getting rid of caption drafts, some of which were complete but stuck in review. So, Data Horde initiated a project to grab as many of these unpublished captions as possible, with a lot of assistance from Archive Team.

Although officially removed on September 28, we were able to continue accessing caption drafts for a whole month, until the endpoint was cut off at around 8 PM (UTz), October 28. In total, we scanned and pooled nearly 52 million items, including videos, channels, playlists, and mix playlists; for drafts. We also have two or three other bulky collections which were retrieved manually by archivists. In the coming days we will be working on organizing these drafts, with the hopes of giving them a collection on the Internet Archive.

We also have a few other ideas in mind for what to do with this massive collection of captions, so stay tuned these next couple of days to find out! In the mean time check out our YouTube Captioner’s Toolkit page for information on alternatives for the retired community captions feature.

]]>
https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/feed/ 0
Help Archive YouTube’s Community Contributions! https://datahorde.org/help-archive-youtubes-community-contributions/ https://datahorde.org/help-archive-youtubes-community-contributions/#respond Sat, 26 Sep 2020 00:27:21 +0000 https://datahorde.org/?p=1478 YouTube is removing their community contributions feature on September 28. In case you haven’t already heard, that’s the feature which allows viewers to add captions/subtitles, translated titles and video descriptions on videos. And YouTube seems to be pretty insistent on removing the feature, despite massive backlash.

Now although YouTube have given their word to keep published community captions (and other contributions) online, there’s a small detail many people have overlooked. Last year, YouTube restricted the feature to only allow uploaders to publish contributions. As such, there are many many unpublished captions, title/description translations stuck in review. Furthermore, no information is given on the fate of Caption Credits (people who opted to have their name shown).

Although unpublished on videos, these contributions are still visible in the community captions editor. So for the last few days we have been developing a tool to archive all this data! We have finally reached a mature enough stage that anyone reading this can now run the “YouTube Community Contribution Archiver” (YCCA) on their computer, to help us collect as many of these contribution drafts as we can:

https://github.com/Data-Horde/ytcc-archive

Ideally it’s best if channels accept their own videos, not only from a moral standpoint but also because this method hides information (formatting, stylization, authors of unpublished captions etc.) So beyond archiving these we’ve also done our best to try and reach out to content creators across YouTube.

The good news is that we won’t be archiving these for naught, projects such as YouTubexternal CC will likely be a new home for these captions and other content which have been trapped for so long.

We also have a Discord server where we are coordinating all these efforts, so feel free to hop on board if you have any questions or want to just meet the team!

Discord

Good Luck archiving! Click here to view current stats.

For further context on how we wound up in this predicament, check out our YouTube CC History series:

Part 1: Unusual Beginnings on Google Video

Part 2: Pioneering Online Accessibility

Part 3: Scaling the Waterfall, Captions for All

Part 4: The Untold Story of why YouTube is removing Community Contributions;

]]>
https://datahorde.org/help-archive-youtubes-community-contributions/feed/ 0