Scaling the Waterfall, Captions for All; YouTube CC History pt.3

Scaling the Waterfall, Captions for All; YouTube CC History pt.3

[Thumbnail taken from Niagara Falls, Canada]

Continued from Part II

When YouTube’s automatic-captioning was first demonstrated, it was only available for a select few channels. Around March of 2010, YouTube decided that they were ready and made automatic-captioning available to all channels! This also meant that automatic captions could now be translated as well.

The feature was introduced to provide closed captioning for videos that were not already captioned, but the question is did it get the job done? The reception was mixed. No matter how much this innovation was appreciated, the frequent errors were noticeable. Furthermore, it was only available in English, so only for a fraction of the videos on YouTube.

All the while, the rapid flow of videos raged on… At the time it was estimated that 20 hours worth of video were being uploaded to YouTube by the minute! That rate has only grown since. Not to mention that this is only accounting for YouTube, what about the rest of the internet?

In a CNN interview Ken Harrenstien, chief engineer of the captions team, once remarked:

What I would like to see happen is that people would see what we are doing here, and realize, “Oh, this is really useful,” and all start doing the same thing 

Whether he knew it or not, Harrenstien’s dream was not so far off…

An important event later into 2010 was the US Congress passing the Century Communications and Video Accessibility Act (CCVA). The act which aimed to improve accessibility to technology, introduced regulations such as requiring video programming that is closed captioned on TV to be closed captioned when distributed on the internet. This meant that news and entertainment media that were transitioning from television to online streaming would have to bring their readily available closed captions along. So not only was YouTube going to get more caption uploads, but other websites now had to accommodate these changes by adding or improving their closed captioning interface.

The act also mandated video devices be designed with specifications to support closed captioning. In light of this, YouTube would bring closed captions to mobile in 2010, and introduce support for additional subtitling. In addition, they also introduced support for captioning formats used throughout the industry in 2011. Previously, positioning and stylization had required the use of YouTube’s native annotations system, but thanks to this change video distributors now had a way of uploading their own stylized captions.

The worth of stylization is best showcased through video. CPC Closed Captioning placement and formatting by 18hands

However, beyond the need for disability accessibility, there was an even bigger need for international accessibility. During Google I/O 2011, a gadget that allowed for live broadcasting of transcripts was showcased to demonstrate the capabilities of the Captions API. To be clear, these were not automatically generated captions but professional transcripts being written in real-time. That being said, using automatic translation, viewers from all around the world were able to translate these captions into their own languages. And in fact, English captions accounted for only 27.33% of caption viewership, the remaining 72.66% of caption viewers were in fact translating!

The top captioning languages during Google I/O 2011 were English, Spanish, Portuguese, French and Russian.

While all of these developments helped bring existing captions to more people, the question of how to bring high-quality captions for new videos, or videos specifically produced for the internet, still remained. To this end YouTube would pursue three approaches:

  1. Improving auto-captioning
  2. Incentivizing professional captioning/translation
  3. Providing channels with the ability to crowdsource their captions

Let’s talk a bit more about each of these in further detail…

Improving auto-captioning

If Automatic Captioning was to be able to provide captions on any video, it had to work for more than only English. The second language to offer auto-captioning was Japanese in 2011. Korean and Spanish support arrived in 2012; with German, Italian, French, Portuguese, Russian, and Dutch also receiving automatic captions later into the year.

You now have around 200 million videos with automatic and human-created captions on YouTube, and we continue to add more each day to make YouTube accessible for all.

Hoang Nguyen, Software Engineer on the Captions Team, via

It didn’t mean a whole lot to be able to automatically caption 10 languages unless you could do it right. So, as speech recognition technology improved, Google and YouTube upgraded their speech recognition algorithms. Two major upgrades were when the speech recognition technique was switched to a Deep Neural Network model and later to a more specialized LSTM-RNN model, in 2012 and 2015, respectively.

During one of the most iconic moments in Google’s history at Google I/O 2015, CEO Sundar Pichai announced that they had lowered their word error rate in speech recognition down to 8%!.

Things were getting a lot better, but was it enough? Some criticized the accuracy metrics as easily manipulable and automatic captions still were not an alternative for manmade captioning. So, YouTube began looking into manual captioning alternatives.

Incentivizing professional captioning/translation

A harsh reality of online captioning in the earlier years of the world wide web was that it was often distributed under-the-counter, in forms that were generally associated with piracy. Much like with today’s streaming services like Netflix, subtitles for movies or shows were often distributed regionally and it was common for subtitles in a particular language to never show up in most regions. Not to mention, a lot of this media never even received subtitles or captioning at all, giving rise to a rich culture of amateur translations and fansubs. The scars of this era are visible to this day, for example, .srt, which is now an industry-standard subtitle format, originated on a program literally called SubRip

But what about media which was being produced for the internet? Predictably it wasn’t long before fansubbing or fan-captioning websites showed up for videos, such as Overstream. Still, these websites could not shake off the negative connotation surrounding them. As for professional subtitling and captioning, which was a fledgling industry when it came to more traditional media, it was struggling to go online. Captioning vendors were out there, but unable to reach the market.

This paradigm was challenged in 2010, with the founding of a platform called Universal Subtitles, today known as Amara

Amara’s early user interface

Here’s the problem: web video is beginning to rival television, but there isn’t a good open resource for subtitling.

Here’s our mission: we’re trying to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open, just like the Web.

Amara Blog

Right off the bat, Amara made it clear that they wanted to become a champion of online accessibility, through collaborative captioning and subtitling. And unlike YouTube’s Closed Captioning team, Amara was determined not to be bound to a single website.

YouTube did take notice; it is very interesting to note that only a few weeks after Amara’s alpha-launch, YouTube partnered with the DCMP to endorse professional closed caption vendors as “YouTube Ready”.

You may be able to manage creating captions for your videos on your own, but sometimes you have too many videos or your video has elements that need special care. Today, thanks to support from the Described and Captioned Media Program (, we’re pleased to roll out a new “YouTube Ready” designation for professional caption vendors in the United States. The YouTube Ready logo identifies qualified vendors who can help you caption your YouTube videos.

Naomi Black, Caption Evangelist, via YouTube Blog

By 2013, YouTube had decided to launch their own translation network. Uploaders could now request translations on their videos, directly from YouTube’s UI and would be redirected to partner vendors.

Although short-lived, the network showed YouTube’s determination to encourage professional translation and captioning, instead of amateurish captioning. While this might have been an ideal solution for big-budget content creators, it wasn’t viable as a general solution.

Providing channels with the ability to crowdsource their captions

YouTube did already have crowdsourcing, before Amara, albeit not natively. There was CaptionTube, and even the older YouTube Subtitler later received some sharing features. The catch was, none of these community captions would show up on the uploader’s channel, unless the captioner(s) got into contact with the uploader.

To make things easier, in 2012 YouTube introduced link-sharing, which would allow the uploader to share a link with captioners for them to be able to directly upload captions to their account. But it was still a tedious process.

Meanwhile, Amara had expanded their frontiers and was now offering “translation workspaces” for organizations such as TED with Amara Enterprise. This gave websites the ability to “moderate” community contribution, instead of having it all in the open.

And as Amara kept teasing YouTube about their error-prone automatic captions, it became clear that YouTube needed to do something. Their answer is what we know today as “community contributions”.

Reminiscent to the automatic captioning launch, YouTube would take things slowly this time round. Starting in 2014 they would silently launch community contributions for Google and YouTube’s own channels. Gradually, channels like Crash Course, Barely Political, Kurzgesagt would be among the first to enable community contributions. It’s also worth noting the similarities to Amara Enterprise, in that YouTube’s community contributions also operate on an independent transcription and review phase.

Community contributions for all channels would eventually go live in late 2015, to little fanfare. It was strange to see YouTube not make an announcement, seeing that you needed to enable the feature from your settings. Nonetheless, this form of captioning also had its merits, and it has come to be appreciated as more and more channels discovered that YouTube had such a feature…

Through these different captioning methods, YouTube amazingly did manage to scale the waterfall, to some extent. In 2015, YouTube’s product manager Matthew Glotzbach reported that around 25% of all videos on the site had been captioned in one form or another.

So was the golden age of closed captioning on YouTube… But was it to last?

Join us next week to find out!


No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *