internationalization – Data Horde https://datahorde.org Join the Horde! Fri, 23 Jul 2021 10:15:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png internationalization – Data Horde https://datahorde.org 32 32 YouTube Community Contributions Archive Now Available: A Look at the Stats https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/ https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/#respond Fri, 05 Mar 2021 22:22:55 +0000 https://datahorde.org/?p=2091 The YouTube Community Contributions Archive is now available on the Internet Archive! You can download the entire collection, or simply search for and download files for a particular video. The collection is composed of 4096 ZIP archives which contain 406,394 folders and 1,361,998 files. Compressed, the collection is 3.83GB, and once decompressed, the collection is 9.46GB.

YouTube Community Contributions allowed users to create and translate closed captions/subtitles, titles, and descriptions of YouTube videos uploaded by channels who enabled the feature. Users could optionally choose to be credited for their captioning contributions.

While over 50 million videos were scanned for community contributions data, community contributions data was found for only 406,394 videos, indicating that the feature was used on only a small portion of the videos on YouTube. Some videos had YouTube Community Contributions enabled, but only had captions or metadata that was provided by the uploader. This accounted for 198,609 videos, meaning that 207,785 videos in the collection had community-contributed captions or metadata, further indicating that few videos on YouTube received community contributions. This means that approximately 0.4% of the videos that were scanned while creating this archive had community-contributed captions or metadata. This was likely because the community contributions feature was hard to discover in the YouTube interface, which limited the number of people who were aware of the feature.

Breaking down these numbers further, 80,746 videos had community-contributed draft metadata, 127,164 had community-contributed draft captions, 38,440 videos had community-contributed published metadata, 93,499 videos had community-contributed published captions, 179,366 videos had uploader-provided published metadata, and 225,466 videos had uploader-provided published captions.

YouTube Community Contributions allowed those who contributed captions to optionally be credited for their published work. 38,939 videos had credits for published captions created by the community. While captioning credits became inaccessible two weeks before the rest of the community contributions data became inaccessible, the number of videos that had captioning credits was still a considerably low number. It is estimated that, had the credits remained accessible until the rest of the community contributions feature was made inaccessible, about 80 thousand videos would have been found to have had credits.

The community contributions feature supported 196 languages, though not all languages were used equally. Below is a chart of the 25 most popular supported languages, and the number of videos that contain at least 1 file for each language (graphing all of the languages did not display well). This chart includes uploader-provided content.

When the the query excludes the uploader-provided content, we see significant shifts in the 25 most popular supported languages.

This shift indicates that community-contributions were often used to translate content.

A look at the language distribution of the collected metadata, including uploader-provided metadata, appears to be similar to the distribution of languages in the overall collection.

A look at the just the community-provided metadata provides a slightly different distribution of data.

The distribution of captioning languages, including uploader-provided captions, is similar to the collection overall.

The distribution of captioning languages, excluding uploader-provided captions, also resembles the overall collection.

It is also interesting to look at the distribution of the draft community captions and metadata that were collected in comparison to the published community captions and metadata.

The published community contributions data appears to be more evenly distributed across languages compared to the draft community contributions data.

Some users contributed many captions and were credited for their work on many videos. In total, 83,563 channels appeared in our credits collection. On average, a channel was credited on 1.47 caption tracks. 55 channels were credited for more than 50 caption tracks, and 14 channels were credited for more than 100 caption tracks! The top three channels which were credited on the most caption tracks in our collection created 255, 522, and 912 caption tracks, respectively.

Thank you to everyone who contributed to this project! Additional details about the collection itself are available in the Internet Archive item description. If you have any additional questions, please feel free to join the project Discord server!

]]>
https://datahorde.org/youtube-community-contributions-archive-now-available-a-look-at-the-stats/feed/ 0
We Just Rescued Thousands of Unpublished YouTube Captions https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/ https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/#respond Fri, 30 Oct 2020 21:33:41 +0000 https://datahorde.org/?p=1690 Community contributions were a feature on YouTube which allowed viewers to provide translations and captions for their favorite channels. Last year, YouTube realized that the feature had some problems and so began restricting it. And this year, believing the feature to be broken beyond salvation, they decided to axe it for good.

Unfortunately, in the process they were going to be getting rid of caption drafts, some of which were complete but stuck in review. So, Data Horde initiated a project to grab as many of these unpublished captions as possible, with a lot of assistance from Archive Team.

Although officially removed on September 28, we were able to continue accessing caption drafts for a whole month, until the endpoint was cut off at around 8 PM (UTz), October 28. In total, we scanned and pooled nearly 52 million items, including videos, channels, playlists, and mix playlists; for drafts. We also have two or three other bulky collections which were retrieved manually by archivists. In the coming days we will be working on organizing these drafts, with the hopes of giving them a collection on the Internet Archive.

We also have a few other ideas in mind for what to do with this massive collection of captions, so stay tuned these next couple of days to find out! In the mean time check out our YouTube Captioner’s Toolkit page for information on alternatives for the retired community captions feature.

]]>
https://datahorde.org/we-just-rescued-thousands-of-unpublished-youtube-captions/feed/ 0
YouTube is hiding Attributions to Fan-Captioners and Translators who wanted to be credited https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/ https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/#respond Thu, 15 Oct 2020 09:00:46 +0000 https://datahorde.org/?p=1589 On YouTube you sometimes come across videos which have subtitles for a bunch of languages. Take for instance this Japanese music video with translations in 20 languages! Have you asked yourself where these come from, is the uploader a polyglot or something?

These translations were contributed by fans of the channel, and if you were to go into the video description a few days ago you would have seen authors listed for some –but not all– of the languages. However, if you check the description now, you will notice that YouTube is now hiding all of these caption authors.

And to make matters worse, there is a good reason why the original list did not have all translators listed. To be able to show up on the list, contributors would have to check a box titledCredit my contribution which was turned off by default. So that means that anyone who was showing up on the list had explicitly volunteered to appear non-anonymously.

This comes following YouTube’s depreciation of the community contributions feature. While YouTube has assured users that they will keep published translations online, it would seem that they do not wish translators to receive any credit for these captions beyond this date.

If you as a captioner or content creator have been adversely affected by the removal of community contributions, check out our YouTube Captioner’s Toolkit for alternatives and useful resources.

]]>
https://datahorde.org/youtube-is-hiding-attributions-to-fan-captioners-and-translators-who-wanted-to-be-credited/feed/ 0
[Obsoleted] YouTube removed community translations, but there is a workaround! https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/ https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/#comments Mon, 12 Oct 2020 20:04:36 +0000 https://datahorde.org/?p=1555 Edit: On October 28 around 8 PM (GMT) the old caption editor was shut down for good, blocking off further contributions for good. For external alternatives to community captioning, check out our Captioner’s Toolkit page:

For the past few years YouTube had been supporting community captions, a feature which allowed users to submit captions or translations for videos of other channels. On September 28 the feature was removed and the menu to access it was hidden.

However, you might still spot new videos with community contributions published after September 28. Take a look at this video uploaded on October 9. Notice that the Caption author is Dark_Kuroh, different from the video uploader.

But how, time travel? As it so happens, even with all the menus hidden, it is still possible to access the old captions editor. This method requires the uploader to know where to check, so it’s best that if you are submitting captions or translations using this method, you let the uploader know the language and the video.


How to submit community captions

So you want to caption or translate someone else’s video… Assuming that the channel still has community captions enabled, go to the following URL:

youtube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}

Example: http://youtube.com/timedtext_editor?action_mde_edit_form=1&v=vCxz2lSeer4&lang=en

where {video code} is the end of the video’s id and {language code} the abbreviation for the language you want to translate into. Fortunately, you can also later switch between languages, so if you don’t know the abbreviation you can use en to start open the editor for English and then switch to your actual language through the Switch Language button.

When you’re done, don’t forget to submit by clicking on the Submit Contributions button in the upper right corner.


How to accept community captions

Previously, you were able to view community submissions from the Community Tab on YouTube Studio. Unfortunately, these are now hidden. So you will need to have an idea of which videos and languages to check.

If you hadn’t enabled community contributions before it’s not too late! Just simply go into YouTube Studio > Videos and choose the videos you would like to enable contributions on. Go into Edit > Community Contributions and switch it on. Lastly, don’t forget to click on “Update Videos”.

You, as an uploader, can also theyoutube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}to access the caption/translation submissions you have received. A good place to start from could be some of your most viewed videos, and you should definitely pay attention to your subscribers to see if they are trying to tell you to accept any of their submissions.

All you have to do when you do find a community submission is to click on the Publish or Publish edits button on the upper right corner,


While YouTube is still working on their permissions system and the community is banding together to find alternatives of their own, it’s important to endure through this transition period. So here’s hoping this tutorial helps you continue to add/receive translations on your videos for a little longer…

]]>
https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/feed/ 2
Help Archive YouTube’s Community Contributions! https://datahorde.org/help-archive-youtubes-community-contributions/ https://datahorde.org/help-archive-youtubes-community-contributions/#respond Sat, 26 Sep 2020 00:27:21 +0000 https://datahorde.org/?p=1478 YouTube is removing their community contributions feature on September 28. In case you haven’t already heard, that’s the feature which allows viewers to add captions/subtitles, translated titles and video descriptions on videos. And YouTube seems to be pretty insistent on removing the feature, despite massive backlash.

Now although YouTube have given their word to keep published community captions (and other contributions) online, there’s a small detail many people have overlooked. Last year, YouTube restricted the feature to only allow uploaders to publish contributions. As such, there are many many unpublished captions, title/description translations stuck in review. Furthermore, no information is given on the fate of Caption Credits (people who opted to have their name shown).

Although unpublished on videos, these contributions are still visible in the community captions editor. So for the last few days we have been developing a tool to archive all this data! We have finally reached a mature enough stage that anyone reading this can now run the “YouTube Community Contribution Archiver” (YCCA) on their computer, to help us collect as many of these contribution drafts as we can:

https://github.com/Data-Horde/ytcc-archive

Ideally it’s best if channels accept their own videos, not only from a moral standpoint but also because this method hides information (formatting, stylization, authors of unpublished captions etc.) So beyond archiving these we’ve also done our best to try and reach out to content creators across YouTube.

The good news is that we won’t be archiving these for naught, projects such as YouTubexternal CC will likely be a new home for these captions and other content which have been trapped for so long.

We also have a Discord server where we are coordinating all these efforts, so feel free to hop on board if you have any questions or want to just meet the team!

Discord

Good Luck archiving! Click here to view current stats.

For further context on how we wound up in this predicament, check out our YouTube CC History series:

Part 1: Unusual Beginnings on Google Video

Part 2: Pioneering Online Accessibility

Part 3: Scaling the Waterfall, Captions for All

Part 4: The Untold Story of why YouTube is removing Community Contributions;

]]>
https://datahorde.org/help-archive-youtubes-community-contributions/feed/ 0
The Untold Story of why YouTube is removing Community Contributions; YouTube CC History pt. 4 https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/#comments Fri, 18 Sep 2020 20:58:47 +0000 https://datahorde.org/?p=1442 Continued from Part III

Last time, we saw YouTube introduce new ways to add closed captions, as well as some UI improvements. Owing to these changes, the quantity of captioned videos increased drastically between 2009 and 2015. However, outside of accuracy improvements to automatic captioning, it cannot be said that quality was improving to match this quantity.

This lag in quality can be attributed to a problem many online platforms suffer from: Bundling closed captioning with subtitles. If you recall, YouTube had taken note of the growing internationalization of their platform and so began diverting resources accordingly. But at some point, international accessibility had became the greater priority. And this is noticeable through gradual changes in the UI, such as relabeling “automatic-captions” as “auto-generated” or naming the captions panel in YouTube Studio as “Subtitles”.

But as we will soon see, these changes were not only in name…


Although the terms captions and translated subtitles are used interchangeably, there is a slight nuance between the two. They differ in what additional information they provide. Closed captions are meant to complement the audio, and that means that in addition to a transcript of what is being said, there should also be sound cues. 

Whereas translated subtitles are meant to break language barriers, and more often include footnotes such as definitions or translations of proper names. It would not be correct to say that one is superior to the other, but they are meant for different purposes. 

Yet as one can imagine, bundling these two into a single feature would inevitably lead to an ambiguity of which form of transcription to adhere to. And that is precisely what happened on YouTube. Even if there were more and more captions by the day, these were not being authored with the needs of the Deaf Community in mind. 

In reaction to these low-quality captions, the NoMoreCraptions (sic) movement was born. The movement was headed by Michael Lockrey (aka TheDeafCaptioner) and Rikki Poynter, who came up with the name independently of one another. Yet theirs was a common goal: To spread awareness to combat low-quality captions.

If Google and YouTube aren’t going to do the right thing on accessibility, then it looks like it’s going to be down to you and me.

TheDeafCaptioner, 2014

Lockrey was among the earliest critics of YouTube’s low-quality captioning and other shortcomings in accessibility. His focus was on the inadequacy of automatically generated captions. Around the same time that YouTube was testing the closed-beta for community contributions, Lockrey launched nomorecraptions.com. A spiritual successor to CaptionTube, the web-app allows users to caption any YouTube video of their choosing (regardless of permissions) but they can only keep the caption files to themselves. A few years later, it also received a feature to auto-import automatic captions, so captioners would not have to start from scratch.

Of course, automatic captions were not the only problem. There were also the issues in manmade captioning, which became a lot more frequent once community contributions became available for all channels in late 2015. This is when Rikki Poynter comes into the picture.

The good thing is that with the announcement of the fan contribution subtitles and captions, more big YouTubers have had their viewers help them caption their videos, and that’s pretty awesome. Here’s the bad: A lot of these big YouTubers, okay a handful – so far that I’ve seen – of these big YouTubers, they don’t check over their stuff and there’s a lot of incorrect captions, not only incorrect, but there’s a lot of jokes that get put in there, and it basically looks like it’s all for laughs.

Rikki Poynter, Stop Adding Jokes and Useless Commentary Into YouTubers’ Captions!

Poynter would begin an extensive campaign to promote proper captioning and to fight against caption vandalism. Sadly much to her dismay, she was not able to make a whole lot of noise. It was rare to see any of these big channels taking responsibility for these low quality, sometimes even malicious, captions. 

You see, although community contributions were designed to have an independent contribution and review phase, all it took was a few alt-accounts for a person to force their captions to be accepted. It was an Achilles’ heel, and it would come back to bite YouTube in the near future…


2017 was a turning point for YouTube. At this point, over one billion videos had been automatically captioned! Approximately 15 million views came from automatic captions per day! Confident in their technology, YouTube slowly also began to make the automatic translation feature increasingly more prominent.

Whether or not if it was due to a vision of eventually fully automatizing captions and subtitles, around June they shut down their professional translation network. As if to fill the void, YouTube all of a sudden began actively promoting community contributions which they had been updating silently up until this point, through video tutorials:

(Click here if the above video is inaccessible)

Now that people were actually becoming aware of this feature, increasingly more channels began enabling community contributions. Consequently, as viewers caught on, community contributions would flourish. All of a sudden, it became possible for nearly any channel to be able to expand into an international audience once it reached a large enough following.

Alas, all the flaws and exploits which were present in 2016, were still there and would continue to remain there for some time to come. And over time, spammy and outright malicious captions grew increasingly in prominence. While Poynter and other critics kept trying to raise awareness on the issue, nobody was giving them the time of day. It was going to take a blunder so audacious and so mind-boggling, that people could not miss it.


On August 20, 2019, 3kliksphilip uploaded a video titled The Much Problem Of Translate on his kliksphilip side-channel, where he announces his decision to disable community contributions. He laments that he is tired of having to back-translate these so-called contributions to filter swear-words, self-promotions, or altered links that people were slipping in. A big YouTuber was finally bringing attention to a years’ long problem, but this was not the straw that broke the camel’s back, that would come only a few hours later on August 21.

So I was on the French version of YouTube, and I decided to click on a PewDiePie video, because who doesn’t want to watch a PewDiePie video, right? And this is what happens when you go on a PewDiePie video in a different language, that’s not English. 

So it looks perfectly fine before you do click on it, you know the title says “We BUILT the GREATEST thing in Minecraft”. But when you click on the video the title updates and says “SUBSCRIBE TO CHANNEL AT THE DESCRIPTION”. But then at the very top of the description, there’s a link which takes you to a completely different channel which says subscribe next to it, and it’s not PewDiePie’s, it has nothing to do with PewDiePie it’s a completely random YouTube channel.

JT, Pewdiepie’s Translators ruin his channel…

YouTuber JT would, unlike 3kliksphilips, actually call-out a caption saboteur who had done a lazy job covering his tracks. This however only led to an escalation of hostilities. Before anyone knew it Twitter and Reddit were rioting. PewDiePie himself wound end up disabling his community contributions and explaining why in a video, ironically, titled Best Week Ever, currently standing at over 18 million views!

YouTube finally took notice and at first tried to downplay their responsibility:

Failing to calm the crowd, YouTube then took a radical decision, which they never repealed, to make it so that any and all community contributions had to be verified by the video uploader. Any spam/misconduct sneaking into their captions would now be their sole responsibility:

As a result of these events, it became much more difficult for contributions to ever be published. Many submissions are currently stuck in limbo, waiting indefinitely for uploaders to publish them.
In the end, the toiling translators were demonized and the Deaf Community were still not getting properly captioned videos…


Was that the end of it? You wish! The team responsible for captioning in 2019 might have had some familiar faces, but their attitude was nothing like the team in 2009. Gone was the image of pioneers who were working tirelessly to bring accessibility to the internet who truly cared about what they were doing. All that remained was a façade, buried somewhere under the behemoth of all those robotic subtitles. 

Now, it would absolutely not be fair to say that the captions team was acting irresponsibly on their own accord, but rather under the wishes of their company at large. When people were ranting on Twitter, they were ranting to YouTube, not any of those amazing people. And it would seem that YouTube, as a company, chose to intervene here.

The decision to only allow uploaders to publish contributions taken by YouTube was an irrational one, hastily decided in the heat of the moment with little regard for consequences which would have mattered to a team whose priority was providing accessibility and not garnering reputation.

It was not long before people were asking for alternatives such as contribution moderators. But YouTube had other plans in mind… On April 23, 2020 (the anniversary of the first video on YouTube no less), The Creator Insider channel wanted to have a talk about The Future of Community Captions? They explain that they are considering completely removing community captions, seeing as content creators are unhappy with the feature, and seeing that contributions are not used “that much”.

Highly unusual for a video about accessibility features, where Google often boasts about their inclusivity, the only person featured in this video responsible for closed captions is Product Manager James Dillard. And that’s far from the only thing unusual about this announcement, Ashlee Boyer wrote an entire Twitter thread counting the oddities and unfortunate implications throughout the video.

Of course, since this was only a Creator Insider video, it didn’t get that big of an audience, only being noted by Deaf content creators as a possible red flag. Unchallenged, the decision was officialized by the end of July. The scheduled removal of Community Captions on September 28, was reported through the YouTube Help page, YouTube Support Community, and finally a now-unlisted Creator Insider video which provides the most detailed explanation for the course of action.

First of all, we get to see a glimpse of a change in the hierarchy concerning the captions team. It’s now a branch under the Creator Product team and we have the lead on that, Ariel Bardin to accompany the captions product manager James Dillard from the previous video. While a lot of points are demystified in this video, such as how this removal decision came in the wake of updates to the YouTube Studio and how YouTube would work to roll out a “role” system for people in need of captioning*, ultimately it misses the mark. The reasons provided here do not sufficiently justify the removal of this feature and drew much ire from the few people who did manage to find the link to the video.

* Minor note here, in YouTube’s realization of the idea, there are roles which allow for specific people to be assigned permission to contribute captions at all, instead of moderators in charge of approval.


Public outcry ensued soon after, with many YouTubers criticizing the decision. But as it has always been with captions, it’s been difficult to win the numbers game, as while these are moderately big channels they are not going to bring in 18 million views. There is currently a change.org petition with over 500 thousand signatures. #DontRemoveYoutubeCCs eventually trended on Twitter at one point. Even a number of Deaf charities have pleaded with YouTube to reconsider. 

In hopes of appeasing those unhappy with the decision, YouTube has offered eligible channels a 6 month, later 1 year, Amara subscription. They also haphazardly began rolling out the new “roles” system which not only does not support closed captioning yet, but was missing closed captions on the announcement video for the first few days.

No matter how hard they might try to win people over, the decision to do away with community contributions altogether instead of attending to the issues which have made the feature exploitable is nothing short of betrayal. Not only a betrayal of users and content creators on YouTube’s platform but even Google’s own employees, some of whom waited almost a decade for community contributions to become a feature


Even though we have reached the present day, the history of closed captions is not over. It is merely turning over a new page. Where YouTube has fallen short, projects such as externally hosting community contributions have come about:

As for us here at Data Horde, we are working to collect as many of those contributions stuck in limbo, never to be accepted by the video uploader, as we can. In addition, we will also be grabbing caption credits, as YouTube made no comment on what was to come of these.

If you would be interested in this project head on over to the Discord Server bellow:

https://discord.gg/nacFvU6

And finally, to all of the video uploaders out there, please take your time to accept any unpublished non-spam community captions on your channel while you still can. This is not only important because it feels like the right thing to do, but also because caption formatting and stylization are maintained only when an uploader accepts a contribution.

(Note that the editor preview might lie by rendering the captions differently. The English captions in the video below are an example of how this will work if a formatted contribution is accepted correctly.)

Make sure to also follow Data Horde in the coming days, as we bring more updates on the current situation and share resources to help content creators and viewers who will be harmed by the removal of community contributions…

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/feed/ 1
Scaling the Waterfall, Captions for All; YouTube CC History pt.3 https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/#respond Sat, 12 Sep 2020 19:09:55 +0000 https://datahorde.org/?p=1399 [Thumbnail taken from Niagara Falls, Canada]

Continued from Part II

When YouTube’s automatic-captioning was first demonstrated, it was only available for a select few channels. Around March of 2010, YouTube decided that they were ready and made automatic-captioning available to all channels! This also meant that automatic captions could now be translated as well.

The feature was introduced to provide closed captioning for videos that were not already captioned, but the question is did it get the job done? The reception was mixed. No matter how much this innovation was appreciated, the frequent errors were noticeable. Furthermore, it was only available in English, so only for a fraction of the videos on YouTube.

All the while, the rapid flow of videos raged on… At the time it was estimated that 20 hours worth of video were being uploaded to YouTube by the minute! That rate has only grown since. Not to mention that this is only accounting for YouTube, what about the rest of the internet?

In a CNN interview Ken Harrenstien, chief engineer of the captions team, once remarked:

What I would like to see happen is that people would see what we are doing here, and realize, “Oh, this is really useful,” and all start doing the same thing 

Whether he knew it or not, Harrenstien’s dream was not so far off…


An important event later into 2010 was the US Congress passing the Century Communications and Video Accessibility Act (CCVA). The act which aimed to improve accessibility to technology, introduced regulations such as requiring video programming that is closed captioned on TV to be closed captioned when distributed on the internet. This meant that news and entertainment media that were transitioning from television to online streaming would have to bring their readily available closed captions along. So not only was YouTube going to get more caption uploads, but other websites now had to accommodate these changes by adding or improving their closed captioning interface.

The act also mandated video devices be designed with specifications to support closed captioning. In light of this, YouTube would bring closed captions to mobile in 2010, and introduce support for additional subtitling. In addition, they also introduced support for captioning formats used throughout the industry in 2011. Previously, positioning and stylization had required the use of YouTube’s native annotations system, but thanks to this change video distributors now had a way of uploading their own stylized captions.

The worth of stylization is best showcased through video. CPC Closed Captioning placement and formatting by 18hands

However, beyond the need for disability accessibility, there was an even bigger need for international accessibility. During Google I/O 2011, a gadget that allowed for live broadcasting of transcripts was showcased to demonstrate the capabilities of the Captions API. To be clear, these were not automatically generated captions but professional transcripts being written in real-time. That being said, using automatic translation, viewers from all around the world were able to translate these captions into their own languages. And in fact, English captions accounted for only 27.33% of caption viewership, the remaining 72.66% of caption viewers were in fact translating!

The top captioning languages during Google I/O 2011 were English, Spanish, Portuguese, French and Russian.

While all of these developments helped bring existing captions to more people, the question of how to bring high-quality captions for new videos, or videos specifically produced for the internet, still remained. To this end YouTube would pursue three approaches:

  1. Improving auto-captioning
  2. Incentivizing professional captioning/translation
  3. Providing channels with the ability to crowdsource their captions

Let’s talk a bit more about each of these in further detail…


Improving auto-captioning

If Automatic Captioning was to be able to provide captions on any video, it had to work for more than only English. The second language to offer auto-captioning was Japanese in 2011. Korean and Spanish support arrived in 2012; with German, Italian, French, Portuguese, Russian, and Dutch also receiving automatic captions later into the year.

You now have around 200 million videos with automatic and human-created captions on YouTube, and we continue to add more each day to make YouTube accessible for all.

Hoang Nguyen, Software Engineer on the Captions Team, via https://blog.youtube/news-and-events/youtube-automatic-captions-now-in-six/

It didn’t mean a whole lot to be able to automatically caption 10 languages unless you could do it right. So, as speech recognition technology improved, Google and YouTube upgraded their speech recognition algorithms. Two major upgrades were when the speech recognition technique was switched to a Deep Neural Network model and later to a more specialized LSTM-RNN model, in 2012 and 2015, respectively.

During one of the most iconic moments in Google’s history at Google I/O 2015, CEO Sundar Pichai announced that they had lowered their word error rate in speech recognition down to 8%!.

Things were getting a lot better, but was it enough? Some criticized the accuracy metrics as easily manipulable and automatic captions still were not an alternative for manmade captioning. So, YouTube began looking into manual captioning alternatives.


Incentivizing professional captioning/translation

A harsh reality of online captioning in the earlier years of the world wide web was that it was often distributed under-the-counter, in forms that were generally associated with piracy. Much like with today’s streaming services like Netflix, subtitles for movies or shows were often distributed regionally and it was common for subtitles in a particular language to never show up in most regions. Not to mention, a lot of this media never even received subtitles or captioning at all, giving rise to a rich culture of amateur translations and fansubs. The scars of this era are visible to this day, for example, .srt, which is now an industry-standard subtitle format, originated on a program literally called SubRip

But what about media which was being produced for the internet? Predictably it wasn’t long before fansubbing or fan-captioning websites showed up for videos, such as Overstream. Still, these websites could not shake off the negative connotation surrounding them. As for professional subtitling and captioning, which was a fledgling industry when it came to more traditional media, it was struggling to go online. Captioning vendors were out there, but unable to reach the market.

This paradigm was challenged in 2010, with the founding of a platform called Universal Subtitles, today known as Amara

Amara’s early user interface

Here’s the problem: web video is beginning to rival television, but there isn’t a good open resource for subtitling.

Here’s our mission: we’re trying to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open, just like the Web.

Amara Blog https://blog.amara.org/2010/04/13/subtitles-and-captions-for-every-video-on-the-web/

Right off the bat, Amara made it clear that they wanted to become a champion of online accessibility, through collaborative captioning and subtitling. And unlike YouTube’s Closed Captioning team, Amara was determined not to be bound to a single website.

YouTube did take notice; it is very interesting to note that only a few weeks after Amara’s alpha-launch, YouTube partnered with the DCMP to endorse professional closed caption vendors as “YouTube Ready”.

You may be able to manage creating captions for your videos on your own, but sometimes you have too many videos or your video has elements that need special care. Today, thanks to support from the Described and Captioned Media Program (DCMP.org), we’re pleased to roll out a new “YouTube Ready” designation for professional caption vendors in the United States. The YouTube Ready logo identifies qualified vendors who can help you caption your YouTube videos.

Naomi Black, Caption Evangelist, via YouTube Blog

By 2013, YouTube had decided to launch their own translation network. Uploaders could now request translations on their videos, directly from YouTube’s UI and would be redirected to partner vendors.

Although short-lived, the network showed YouTube’s determination to encourage professional translation and captioning, instead of amateurish captioning. While this might have been an ideal solution for big-budget content creators, it wasn’t viable as a general solution.


Providing channels with the ability to crowdsource their captions

YouTube did already have crowdsourcing, before Amara, albeit not natively. There was CaptionTube, and even the older YouTube Subtitler later received some sharing features. The catch was, none of these community captions would show up on the uploader’s channel, unless the captioner(s) got into contact with the uploader.

To make things easier, in 2012 YouTube introduced link-sharing, which would allow the uploader to share a link with captioners for them to be able to directly upload captions to their account. But it was still a tedious process.

Meanwhile, Amara had expanded their frontiers and was now offering “translation workspaces” for organizations such as TED with Amara Enterprise. This gave websites the ability to “moderate” community contribution, instead of having it all in the open.

And as Amara kept teasing YouTube about their error-prone automatic captions, it became clear that YouTube needed to do something. Their answer is what we know today as “community contributions”.

Reminiscent to the automatic captioning launch, YouTube would take things slowly this time round. Starting in 2014 they would silently launch community contributions for Google and YouTube’s own channels. Gradually, channels like Crash Course, Barely Political, Kurzgesagt would be among the first to enable community contributions. It’s also worth noting the similarities to Amara Enterprise, in that YouTube’s community contributions also operate on an independent transcription and review phase.

Community contributions for all channels would eventually go live in late 2015, to little fanfare. It was strange to see YouTube not make an announcement, seeing that you needed to enable the feature from your settings. Nonetheless, this form of captioning also had its merits, and it has come to be appreciated as more and more channels discovered that YouTube had such a feature…


Through these different captioning methods, YouTube amazingly did manage to scale the waterfall, to some extent. In 2015, YouTube’s product manager Matthew Glotzbach reported that around 25% of all videos on the site had been captioned in one form or another.

So was the golden age of closed captioning on YouTube… But was it to last?

Join us next week to find out!

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/feed/ 0
Pioneering Online Accessibility; YouTube CC History pt.2 https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/#respond Fri, 04 Sep 2020 16:55:18 +0000 https://datahorde.org/?p=1346 Continued from Part I

When we last left off, the stage was set! YouTube had gathered a group of amazing talent to not only bring closed captions to YouTube but to make it monumental!

You had top UI engineers, speech recognition veterans and the hard-boiled closed caption team of Google Video. Above all, the captions team was mostly comprised of people who were deaf, hearing impaired, or who had a loved who was deaf or hard of hearing. You could be sure they were going to give it their all.


The team immediately got to work and their efforts bore their first fruit in the late August of 2008. 

You can add captions to one of your videos by uploading a closed caption file using the “Captions and Subtitles” menu on the editing page. To add several captions to a video, simply upload multiple files. If you want to include foreign subtitles in multiple languages, upload a separate file for each language. There are over 120 languages to choose from and you can add any title you want for each caption. If a video includes captions, you can activate them by clicking the menu button located on the bottom right of the video player. Clicking this button will also allow viewers to choose which captions they want to see.

The YouTube Team, via the official YouTube Blog https://youtube.googleblog.com/2008/08/new-captions-feature-for-videos.html

Some of the first channels to feature closed captions were the channels of BBC Worldwide, CNET, UC Berkeley, MIT and Gonzodoga.

CAPTIONS AND SUBTITLES TEST by TheDawgProductions is unique in that it is quite possibly the first video made where an uploader was consciously available of the feature, unlike the above examples which were captioned after upload. Amazingly, it predates the blog announcement for CC.

Despite the YouTube Blog being nothing short of a beast with over 2 million subscribers at the time, feeling that the announcement wasn’t loud enough, they decided to also do a video announcement on the official YouTube channel: 

An interesting part of this video announcement, in addition to the rickroll around 0:34 into the video, is the mention that captions and subtitles are also helpful for people who speak other languages. This was nothing new since you could add caption tracks in multiple tracks even back in the Google Video days, but the way this remark is juxtaposed into the video suggests that they are teasing at a new feature.

With the announcement of machine translation in November, we got to see just what that feature was! Viewers could now translate closed captions into whatever language they chose, via Google Translate. The feature has changed a bit over the years, but in its earliest form you could translate any captions which the user had uploaded into any of the languages available on Google Translate: 

Demonstration of Captions and Translation by captionmic

It’s worth noting, however, that these translations were not permanent, they were designed to be dynamic as Google Translate kept improving over time. If the uploader wanted to ensure viewers see a translation in a particular language they would still have to add that in themselves.

On top of this, just a few days later, closed caption support was added to embedded videos, so you no longer had to be on YouTube to view closed captions. The captions team was on fire!


After so many updates in quick succession, the captions team would fall silent for a few months, not because they were exhausted, but because they had got to work on something big.

Nonetheless, the first half of 2009 saw some interesting in-house projects utilizing captions. This was fitting since after so many updates improving the viewing experience on YouTube, it was now about time that some work was done to improve the process of content creation. The first of these projects was CaptionTube, a richer editor than YouTube’s built-in caption editor, in 2009 anyway:

CaptionTube also had another interesting feature, reminiscent of caption contribution. Unlike YouTube where you need to have the permission of the uploader to be able to caption a given video, you could caption any video of your choosing. Even if the uploader didn’t want closed captions on their video, you could keep a copy of it for yourself. And if the uploader did want captions, you could just export your captions and email it to them.

I had features for converting various subtitle formats, uploading and previewing them, setting the language, alongside a YouTube video in a UI that looked like Premiere or Final Cut. I wrote CaptionTube and the Python API client library myself, but I had nothing to do with the internal caption infrastructure.

I think roughly 1.5 million videos were captioned with it (a drop in the bucket). I supported the service for ~3 years until YouTube had better internal support, I don’t know how big that team was. After that, I turned it down as they had added the collaboration features. I had thought about adding crowd-sourcing to CaptionTube, but I didn’t have the time and the internal caption team was working on it.

John Skidgel, creator of CaptionTube, personal communication via email. 

Another project was the aptly named google-video-captions, meant to be a dataset of transcripts for videos on Google’s own YouTube channels.

The google-video-captions project has two goals:
* To provide a public corpus of Creative Commons licensed captions that were transcribed from Google videos.
* To enable community-based translation of Creative Commons licensed caption files for these same videos.

Naomi Black, creator and maintainer of google-video-captions, project description

This project was led by Naomi Black, who at the time was managing YouTube and Google’s video channels. Unfortunately, this project died a lot sooner than CaptionTube, as updates ceased around August and eventually Google Code, where the project was maintained, went defunct altogether. 

Yours truly has taken the liberty of exporting what remains of this project to GitHub, in hopes that someday the idea might be revived. With improvements to the YouTube API over the years, the task of transcript retrieval should now be a whole lot easier.


All the while, Google continued to make strides in speech recognition. In March, Voicemail transcription debuted for Google Voice:

It was received to mixed reception, to say the least. While admittedly innovative, the accuracy wasn’t all that good. Secondly, the processing of people’s speech raised privacy concerns, as indicated by comments left on the video and elsewhere. Google was not going to be as hasty the next time they unveiled a major speech-to-text product.

hey keith this is matt mail drink trying out the anti pants go go voice i translator away this is making into an S M S this is gonna be too long later sent you a transcript on now this would be normal so i wonder how accurate this thing will be in translating when i have to say and today i had a salad and okay double cheeseburger and what else i am comforting over some brainstorm notes and when she say something that michael birthday has if you extend the well and yeah i goes hello so the translate laughing to okay i think that’s about

Google Voice – Transcript Test by Spudart and Sparx

After months of working silently, the captions team unleashed their pièce de résistance: Automatic Captioning! Having learned their lesson from Google Voice, this feature was initially exclusive to a select few channels, primarily educational ones.

A day later, they would give a thorough demonstration of this, and another feature called Automatic Timing, to an audience in their office in Washington D.C. Members in the audience included accessibility leaders from the NAD, Gallaudet University, the AAPD and even Marc Okrand. This is one of those videos on YouTube, which you can’t help but wonder why it hasn’t reached a million views yet! If you’ve got an hour to spare, it is a must-watch!

After a brief overview by Jonas Klink, then accessibility product manager at Google, we cut to Vint Cerf who delivers the Introduction. He opens with how “to organize the world’s information and make it accessible and useful” entails accessibility for the deaf, hearing-impaired, visually, or motor-impaired.

So I want to tell you, first of all, why accessibility’s personally important to me. Sigrid, who is in the audience over there — wave your hand, Sigrid — and I are both hearing-impaired. Sigrid was totally deaf for 50 years. She now has two cochlear implants. And they work wonderfully well. They work so well, we had to buy a bigger house, because she wanted bigger parties, because she could hear. So this is a technology which is spectacular. I’ve been wearing hearing aids since I was 13. You can do the math. That’s 53 years.

So both of us care a great deal about how technology can help people with various impairments get access to information and be connected with the rest of the world. So quite apart from my job at Google, I have great personal interest in what we’re talking about today.

Vint Cerf, Announcement on Accessibility and Innovation, 3:41 

At one point he makes a slip-up, stating that YouTube introduced captions in 2006 when it was actually Google Videos which introduced closed captions. Next, he hands the microphone to Ken Harrenstien, who you might recall was the chief engineer on closed captions during the Google Video days, after talking about their history together.

Ken Harrenstien, who had been waiting for this moment for at least three years, continues with a showcase of caption features that have been added to YouTube over the past year: settings to adjust the size of captions, to turn the background off, etc. But at the end of this section, his optimism up until this point begins to fade as he addresses the sheer amount of uncaptioned videos.

To provide a visualization, he takes out a labeled bottle of water and tells the audience to assume this bottle represents all the videos that are currently captioned. Then he opens a clip of Niagara Falls from YouTube and tells the audience this represents all of the videos being uploaded to YouTube.

This is our problem. Remember what Vint said earlier? Every minute we stand here and talk, people are uploading 20 to 23 hours of video. Not minutes. Hours. Not 23 videos themself. We’re talking hours. So tons. And that’s every minute, every day. Every month. It just — it’s coming in.

So the question is, who’s going to bottle that water?

Ken Harrenstien, Announcement on Accessibility and Innovation, 25:08 

How to keep up with this perpetual flow? He then proceeds to play a clip from that year’s Google I/O, with captions switched on. He then turns to the audience to ask if they notice anything different.

YEEESSSS!

People who notice the mistakes eventually guess that the captions are machine generated, much to Ken Harrenstien’s amusement. Automatic captioning is finally on YouTube! Having learned their lessons from Google Voice, instead of launching the feature for all users, initially automatic captioning would only be available to a few other partner channels.

Back then the process was a lot slower, you got a warning saying that the feature was experimental and then you would have to wait sometime for the transcript to be generated. They were taking things easy. It would be months before the feature was allowed on other channels.

Still, it was a million times better than nothing, and what’s more, there was an alternative for all the other channels. Automatic timing was a new feature that allowed users to upload an existing transcript and would generate timestamps to align the text with speech. Believe it or not, this feature is still on YouTube to this day!

The final portion of the demonstration is Naomi Black, the same Naomi who had worked on building a public corpus of captions, showcasing these features from the uploader’s perspective.


As the audience applauds, the question lingers: “Will automatic captioning be able to keep up with the astronomical rate of video uploads?” It brought hope, that was for sure, but the unreliable accuracy still left something to be desired.

Join us next week, when we talk about the ingenious attempts to contain this waterfall! 

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/feed/ 0
Unusual Beginnings on Google Video; YouTube CC History pt.1 https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/#respond Fri, 28 Aug 2020 22:01:30 +0000 https://datahorde.org/?p=1125 Have you ever watched a video where you couldn’t understand a word of what was being said? Maybe the volume was too low or the people were whispering. Perhaps you are hard-of-hearing or even deaf. Or it was just in a foreign language. These are no reasons to be ashamed! You’re only one click away from seeing what was being said using the CC button! 

Today, the closed caption “CC” is one of the most recognizable icons in the world. A good part of that is due to the advent of captioning for online video and streaming services. In particular, YouTube once stood out as an early adopter and strong advocate for closed captioning.  

Option to contribute subtitles/CC on YouTube

Alas in recent years they have made decisions that have led to them going from caption-hero to villain. Their latest act of treachery is their decision to remove community contributions, a feature that allowed viewers to submit captions for videos they wish to transcribe or translate. 

As a sort of countdown, until community contributions are gone for good, I am going to be recounting the history of closed captions on YouTube, over the next few weeks. I hope our readers will find this to be a fascinating retrospective on a much-overlooked technology…


Classic YouTube Logo from 2005

Our story begins in 2005, with the inception of two websites: YouTube and Google Video. Now, online video was by no means a new phenomenon, the concept of a viral video was already a decade old at this point. But the video culture of the time was quite different. Videos were big files, so if you wanted to share a clip with your friend via email, you needed to lower the quality and duration. If you wanted to have it available for the world to see you would need to host it on a web server and even then cost would be the least of your worries. What had been a daunting challenge up until this post was building a platform, where users could freely upload videos, while remaining profitable. Both YouTube and Google Video would end up becoming early success stories in that regard.

It would not be long before these two websites, especially YouTube, became a part of our daily lives. What I am sure will come as a surprise to some readers, is that these two websites share a very much intertwined history, especially when it comes to captioning. 

Google Video Logo

Throughout 2005, Google Video was definitely leading the competition. But once it started to lose ground to YouTube in 2006 it never recovered. Still, Google seems to have believed that there was some hope for Google Video and to that end, they competed to become the better platform by introducing new features. One such feature was, in fact, closed captioning.

(There is supposed to be a Google Trends graph here, but if it somehow fails to load click here)

Search frequency for Google Video and YouTube between 2005 and 2006.

Banner of the Google Video Blog

On 19 September support for closed captioning of videos was announced on the Google Video Blog:

Although many of us are responsible for making this possible, it’s particularly meaningful to me because I’m not only an engineer fortunate enough to work on Google Video — I’m also deaf. In some ways this reminds me of when closed-captioning (CC) was first introduced; before that, little on TV made sense and the only movies worth paying for were foreign films, because those were the only ones with subtitles! I now have the same sense of hope that I did then, when you could finally see visible progress and knew for sure that however long it took to perfect things, we really were on the way.

Ken Harrenstien, Google Video blog 

For the first time ever, you had a website where you could both host and stream your own videos and captions. At the time, the CC button seemed like a “subtitles on” option, which you could expect from TV or DVDs. Unfortunately, this historical development was quickly overshadowed by Google’s acquisition of YouTube, only three weeks later in October. Nonetheless, it was quite the accomplishment and even YouTube takes this to be the epoch for closed captions on their website, as evidenced by future blogposts and conferences.

Soon after, Google Video began a slow descent into obscurity, as the development team slowly migrated to work on YouTube. Although the website went defunct altogether, we still have some snapshots from this era. The thumbnail for this post is a screenshot of one of the first-ever captioned videos uploaded, titled “Google Video & YouTube Support Closed-Captioning”. Although this photo only shows us the UI, fortunately, the “Me at the zoo” of closed captions survives on YouTube! 

The uploader, Dan Greene, who provided us with a lot of insight during my research, later uploaded the video to YouTube in 2009:

Now you might be wondering why anyone would wait so long. The fact of the matter is, it would be another 2 years until YouTube received its own native support for captions. Ken Harrenstien and the rest of the core team working on captions would remain on the Google Video side of things for many more months.

Undaunted, the Google Video team would continue to bring innovations to captioning. While their accomplishments only survive in fragments, a good place to be looking is Google’s own patents, a lot of which have Ken Harrenstien’s name on them. There is US9710553B2 relating to UI design for closed captions and US20140301717A1 relating to support for multiple caption tracks. 

There is one patent in particular which is especially relevant to the community contribution feature, US7992183B1: Enabling users to create, to edit and/or to rate online video captions over the web. To date, this is the earliest known proposal for any form of community contribution to closed captions, on Google’s products anyway. Up until this point, if an uploader hadn’t added captions for a video, that was it, you could not do anything about it. There were services like Overstream, which allowed people to caption online videos, but not only were these relatively unknown, you had to actively hunt for a captioned video. This patent describes a method that could have potentially changed that.

Figure illustrating options for captions in different languages, there are 3 captions available in English authored by Josh M., Kim L. and J. Doe with a score of 4, 3 and 1 respectively.

This form of community contribution would operate on a rating-based system, multiple users would be able to submit closed captions and viewers would decide on which captions were the best, by rating on a scale from 1 to 5. This differs from the modern system, which enforces one translation per video, and is split into a submission and review phase. In practice, one could imagine this early concept to have worked much faster. I was unable to find any evidence that this idea was ever realized, but at the very least it goes to show that the need for community contribution was there, even back in 2007.


Moving into 2008, Google really started to pick up the pace. On 4 June, we saw YouTube’s answer to captions: Video Annotations!  

We’re happy to announce a new way to add interactive commentary to your videos — with Video Annotations. With this feature, you can add background information, create branching (“choose your own adventure” style) stories or add links to any YouTube video, channel, or search results page — at any point in your video. 

The YouTube Team, The YouTube Blog

Video annotations were designed to be a lot more versatile than captions, not solely restricted to descriptions of what is being said or happening on the screen. Still, seeing as there still was no native closed captions feature, it would not take long until people figured that they could use these video annotations for subtitling or captioning videos.

Using annotations:

Q: What will you show us today?
A: I'm gonna show you some nice new armors, how we create them and such
Original Video: Aion – New Armors (interview) – WITH ENGLISH ANNOTATIONS!
Annotations are no longer available on YouTube, so the ones you see here were retrieved via https://invidious.snopyta.org/watch?v=mCg54B69aY4&iv_load_policy=1

And if you are still not convinced that this was a response aimed at the closed captioning, get a load of Google Video’s newest feature that was announced the very next day: Closed Captioning Search! This made it possible to search not only for a particular video but through its contents. An example use case could be searching for a particular talk:

Here’s a nice example – search for [“that’s a tremendous gift”] . Make sure you’ve selected List View, and you should see a video featuring Randy Pausch. Clicking on the “Start playing at search term (50:16)” link will take you to a point slightly before the appearance of that caption.

Ken Harrenstien, Google Video Blog
Results for "that's a tremendous gift", an option to skip to the 50:16 mark of Randy Pausch's "Really Achieving Your Childhood Dreams" speech is available.

Believe it or not this feature was eventually brought over to YouTube, though if it works a bit differently.

Sad to say, these features were not much appreciated at the time, as YouTube was suffering from poor layout changes (see comments on the old blog page) and Google Video was suffering from serious stability issues. There was however, a silver lining to all of this.

If you remember visiting YouTube in 2008, you might recall some of the interesting gadgets and features they had introduced for the upcoming presidential election. Let’s talk about a little project called Gaudi.

The premise of Google Video’s caption search, being able to search for things people say, might not very enticing. But what if you wanted to search for things important people, say, politicians have said in the past? So Google’s Speech Research Group developed this tool called Gaudi, which not only allowed you to search through the speeches of presidential candidates but also generated transcripts for relevant news and politics videos as they were uploaded. By bundling speech-to-text

Gaudi is gone now, but it was once available on Google Labs, as a gadget on iGoogle and it was even embedded on YouTube’s YouChoose page offering the latest updates on the presidential campaigns. It even received some media coverage.

If you would like to learn more about the behind the scenes for this one, one of the authors, Michiel Bacchiani, was kind enough to upload a copy of their publication to ICASSP ’09, on his website.


So now we had a form of speech-to-text on the largest video sharing website in the world, no matter how limited. But what to do with this technology? Combining it with video annotations would be too non-trivial. YouTube still didn’t have closed captions either, forcing people to have to use websites like YouTube Subtitler. Perhaps, now was the time to bring the Google Video captioning features over to YouTube!

In the coming months, members from the team that worked on Gaudi, Google Video’s division responsible for closed captions and the existing YouTube team would come together to bring closed captions to YouTube. 

Join us next week, to see how all these little pieces would come together to make a whole, truly greater than the sum of its parts.

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/feed/ 0
The Impact of YouTube Removing Community-Contributed Closed Captions https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/ https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/#respond Fri, 14 Aug 2020 18:50:06 +0000 https://datahorde.org/?p=1196 Closed Captions have been an essential feature on YouTube for nearly 12 years. They’ve made the platform more accessible, not only by serving transcriptions for deaf and hard-of-hearing users but also by serving as a medium for translating videos into a multitude of languages.

Screenshot of Michael from VSauce
Spooky Coincidences by Vsauce, a video which allows for community contributed closed captions.

For more than half of their existence, YouTube has supported one form or another of contributing closed captions on videos which don’t already have them. Which is why two weeks ago, it came as an unpleasant surprise when Google announced that they were removing community contributions in September:

Community contributions will be discontinued across all channels after September 28, 2020. Community contributions allowed viewers to add closed captions, subtitles, and title/descriptions to videos. This feature was rarely used and had problems with spam/abuse so we’re removing them to focus on other creator tools. You can still use your own captionsautomatic captions, and third-party tools and services. You have until September 28, 2020 to publish your community contributions before they’re removed. 

Google Support Page, retrieved 13 August 2020

Now to be clear, closed captions aren’t going anywhere.

  • Video uploaders will be able to continue uploading their own closed captions,
  • Captioning services such as Amara will remain available,
  • And of course, YouTube’s own automatic captions are here to stay.

But no longer will viewers be allowed to contribute captions. Previously published captions will still be online, submissions will just not be allowed anymore. It is unknown if attribution for previously published captions will remain intact.

Even more confusing was Google’s justification for this removal, which they provided on the YouTube Support Community. Lamenting that content creators and viewers expressed dismay at the frequent abuse and low quality in community captions, they attributed the relatively “rare usage” of this feature to the bad name it has made for itself. It was a broken and unwanted feature, which warranted a discontinuation…


Let’s talk a bit about just how “rare” that usage statistic is?

… the feature is rarely used with less than 0.001% of channels having published community captions (showing on less than 0.2% of watch time) in the last month.

As much as the wording makes the feature seem insignificant, note that it is being expressed in terms of channels. This does not paint a very clear picture of how many viewers are dependent on this feature.

number of youtube channels
There are approximately 16 thousand channels with over a million subscribers, that’s at most 0.0005% of all channels.

Recently tubics, a company that provides SEO for content creators on YouTube, made a blog-post where they presented some interesting statistics on YouTube channels. Using SocialBlade data, they determined that a very small percentage of all the channels on YouTube have the majority of viewership. Even by the most optimistic estimate, only 0.006% of all channels have over a 100 thousand subscribers, of which only 0.0005% have over a million. This shows us that even a feature which is utilized by this small a fraction of channels can have an impact on potentially millions of viewers.


The ulterior motive here is likely to promote Google’s own automatic captions and especially their automatic translation. Their accuracy has been drastically improving over the years, and in some cases this really is the only reliable way to translate a video into a less spoken language. But that being said, is it a replacement for community contributions? I think not!

There are a lot of use cases unique to community contributions, which aren’t offered by any of the remaining alternatives.


  • When a user uploads their captions or Google generates them, it’s often a one-time process. If there’s a mistake, no one is going to go back to fix it. But with community contributions, users can build off of previous work by correcting one another, not unlike how Wikipedia works.
Typo: "Which sits in Iraq" instead of "Which sits in a rack"
What are Microservices? by freeCodeCamp.org,
There is a typo in the captions: “Which sits in Iraq” instead of “Which sits in a rack”.
A Rack
Thanks to Community Contributions being allowed on this video, anyone can go in and fix it!
  • Even though community contributions carry the risk of sabotage, they also provide viewers with the power to moderate. Someone snuck in a joke on a video you were watching? Just go into the caption editor and edit it out!
Sincerely, Me || Dear Evan Hansen Animatic by szin,
The 🙁 at 0:21 is likely an artifact from translating the original captions from Polish to English (where there was text), but it’s not uncommon to sneak in emoticons or naughty easter eggs where they don’t belong. Community Contributions spare channels of having to moderate these manually by relegating the task to channel viewers.
  • Where automated captions are stuck with plain text, and uploaders are limited by the time they’re willing to invest to stylize their captions, there are people out there waiting to tap into their potential. YouTube supports a plethora of caption/subtitle formats, which a seasoned captioner can use to add color, formatting and emphasis!
Scenarist Closed Captions
YouTube supports Scenarist Closed Captions… but how well? by Jibberuski
SCC is but one of many captioning formats. If you’ve ever watched a video where captions weren’t locked into the bottom of the screen, it was likely captioned in SCC or EBU-STL.
  • In translation, there are times when we don’t want it to be too precise. An example could be explaining the meaning of a word, the wordplay in a joke… And although there are techniques to recognize proper names, there are sentences that automated translation is not designed to handle.
1984 by George Orwell, Part 1 by Crash Course
Around the 2nd minute mark, John Green explains the parallels and contrast between the names of Winston Smith, the protagonist of 1984, and British Prime Minister Winston Churchill. Here, the consciously authored Spanish captions are able to highlight the significance of the words Church and Hill.
Whereas the automatically generated captions immediately translate Churches into Iglesias and actually fail to translate Hills into Colinas.

As you can see, there are all too many reasons not to remove community contributions. Which is why more and more YouTubers are campaigning against the change. There’s a change.org petition which has already been signed by over 470,000 people. I don’t know about you, but that sounds like a pretty significant figure.

At the very least, Google could try a compromise. It’s still possible to live in a world where the viewers can contribute to captioning, and the process can be automated. Take Google Translate for instance, they’ve been able to improve their accuracy a lot through the “suggestion” feature.

Making captioning exclusive to the channel owner and Google’s own tools is going to make a lot of lives harder, when people are dying to make those same lives easier. So go on and spread the word! What will your contribution be?

]]>
https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/feed/ 0