When we last left off, the stage was set! YouTube had gathered a group of amazing talent to not only bring closed captions to YouTube but to make it monumental!
You had top UI engineers, speech recognition veterans and the hard-boiled closed caption team of Google Video. Above all, the captions team was mostly comprised of people who were deaf, hearing impaired, or who had a loved who was deaf or hard of hearing. You could be sure they were going to give it their all.
The team immediately got to work and their efforts bore their first fruit in the late August of 2008.
You can add captions to one of your videos by uploading a closed caption file using the “Captions and Subtitles” menu on the editing page. To add several captions to a video, simply upload multiple files. If you want to include foreign subtitles in multiple languages, upload a separate file for each language. There are over 120 languages to choose from and you can add any title you want for each caption. If a video includes captions, you can activate them by clicking the menu button located on the bottom right of the video player. Clicking this button will also allow viewers to choose which captions they want to see.
The YouTube Team, via the official YouTube Blog https://youtube.googleblog.com/2008/08/new-captions-feature-for-videos.html
Some of the first channels to feature closed captions were the channels of BBC Worldwide, CNET, UC Berkeley, MIT and Gonzodoga.
Despite the YouTube Blog being nothing short of a beast with over 2 million subscribers at the time, feeling that the announcement wasn’t loud enough, they decided to also do a video announcement on the official YouTube channel:
An interesting part of this video announcement, in addition to the rickroll around 0:34 into the video, is the mention that captions and subtitles are also helpful for people who speak other languages. This was nothing new since you could add caption tracks in multiple tracks even back in the Google Video days, but the way this remark is juxtaposed into the video suggests that they are teasing at a new feature.
With the announcement of machine translation in November, we got to see just what that feature was! Viewers could now translate closed captions into whatever language they chose, via Google Translate. The feature has changed a bit over the years, but in its earliest form you could translate any captions which the user had uploaded into any of the languages available on Google Translate:
It’s worth noting, however, that these translations were not permanent, they were designed to be dynamic as Google Translate kept improving over time. If the uploader wanted to ensure viewers see a translation in a particular language they would still have to add that in themselves.
On top of this, just a few days later, closed caption support was added to embedded videos, so you no longer had to be on YouTube to view closed captions. The captions team was on fire!
After so many updates in quick succession, the captions team would fall silent for a few months, not because they were exhausted, but because they had got to work on something big.
CaptionTube also had another interesting feature, reminiscent of caption contribution. Unlike YouTube where you need to have the permission of the uploader to be able to caption a given video, you could caption any video of your choosing. Even if the uploader didn’t want closed captions on their video, you could keep a copy of it for yourself. And if the uploader did want captions, you could just export your captions and email it to them.
I had features for converting various subtitle formats, uploading and previewing them, setting the language, alongside a YouTube video in a UI that looked like Premiere or Final Cut. I wrote CaptionTube and the Python API client library myself, but I had nothing to do with the internal caption infrastructure.
John Skidgel, creator of CaptionTube, personal communication via email.
I think roughly 1.5 million videos were captioned with it (a drop in the bucket). I supported the service for ~3 years until YouTube had better internal support, I don’t know how big that team was. After that, I turned it down as they had added the collaboration features. I had thought about adding crowd-sourcing to CaptionTube, but I didn’t have the time and the internal caption team was working on it.
Another project was the aptly named google-video-captions, meant to be a dataset of transcripts for videos on Google’s own YouTube channels.
The google-video-captions project has two goals:
Naomi Black, creator and maintainer of google-video-captions, project description
* To provide a public corpus of Creative Commons licensed captions that were transcribed from Google videos.
* To enable community-based translation of Creative Commons licensed caption files for these same videos.
This project was led by Naomi Black, who at the time was managing YouTube and Google’s video channels. Unfortunately, this project died a lot sooner than CaptionTube, as updates ceased around August and eventually Google Code, where the project was maintained, went defunct altogether.
Yours truly has taken the liberty of exporting what remains of this project to GitHub, in hopes that someday the idea might be revived. With improvements to the YouTube API over the years, the task of transcript retrieval should now be a whole lot easier.
All the while, Google continued to make strides in speech recognition. In March, Voicemail transcription debuted for Google Voice:
It was received to mixed reception, to say the least. While admittedly innovative, the accuracy wasn’t all that good. Secondly, the processing of people’s speech raised privacy concerns, as indicated by comments left on the video and elsewhere. Google was not going to be as hasty the next time they unveiled a major speech-to-text product.
hey keith this is matt mail drink trying out the anti pants go go voice i translator away this is making into an S M S this is gonna be too long later sent you a transcript on now this would be normal so i wonder how accurate this thing will be in translating when i have to say and today i had a salad and okay double cheeseburger and what else i am comforting over some brainstorm notes and when she say something that michael birthday has if you extend the well and yeah i goes hello so the translate laughing to okay i think that’s about
Google Voice – Transcript Test by Spudart and Sparx
After months of working silently, the captions team unleashed their pièce de résistance: Automatic Captioning! Having learned their lesson from Google Voice, this feature was initially exclusive to a select few channels, primarily educational ones.
A day later, they would give a thorough demonstration of this, and another feature called Automatic Timing, to an audience in their office in Washington D.C. Members in the audience included accessibility leaders from the NAD, Gallaudet University, the AAPD and even Marc Okrand. This is one of those videos on YouTube, which you can’t help but wonder why it hasn’t reached a million views yet! If you’ve got an hour to spare, it is a must-watch!
After a brief overview by Jonas Klink, then accessibility product manager at Google, we cut to Vint Cerf who delivers the Introduction. He opens with how “to organize the world’s information and make it accessible and useful” entails accessibility for the deaf, hearing-impaired, visually, or motor-impaired.
So I want to tell you, first of all, why accessibility’s personally important to me. Sigrid, who is in the audience over there — wave your hand, Sigrid — and I are both hearing-impaired. Sigrid was totally deaf for 50 years. She now has two cochlear implants. And they work wonderfully well. They work so well, we had to buy a bigger house, because she wanted bigger parties, because she could hear. So this is a technology which is spectacular. I’ve been wearing hearing aids since I was 13. You can do the math. That’s 53 years.
So both of us care a great deal about how technology can help people with various impairments get access to information and be connected with the rest of the world. So quite apart from my job at Google, I have great personal interest in what we’re talking about today.
Vint Cerf, Announcement on Accessibility and Innovation, 3:41
At one point he makes a slip-up, stating that YouTube introduced captions in 2006 when it was actually Google Videos which introduced closed captions. Next, he hands the microphone to Ken Harrenstien, who you might recall was the chief engineer on closed captions during the Google Video days, after talking about their history together.
Ken Harrenstien, who had been waiting for this moment for at least three years, continues with a showcase of caption features that have been added to YouTube over the past year: settings to adjust the size of captions, to turn the background off, etc. But at the end of this section, his optimism up until this point begins to fade as he addresses the sheer amount of uncaptioned videos.
To provide a visualization, he takes out a labeled bottle of water and tells the audience to assume this bottle represents all the videos that are currently captioned. Then he opens a clip of Niagara Falls from YouTube and tells the audience this represents all of the videos being uploaded to YouTube.
This is our problem. Remember what Vint said earlier? Every minute we stand here and talk, people are uploading 20 to 23 hours of video. Not minutes. Hours. Not 23 videos themself. We’re talking hours. So tons. And that’s every minute, every day. Every month. It just — it’s coming in.
…
So the question is, who’s going to bottle that water?
Ken Harrenstien, Announcement on Accessibility and Innovation, 25:08
How to keep up with this perpetual flow? He then proceeds to play a clip from that year’s Google I/O, with captions switched on. He then turns to the audience to ask if they notice anything different.
People who notice the mistakes eventually guess that the captions are machine generated, much to Ken Harrenstien’s amusement. Automatic captioning is finally on YouTube! Having learned their lessons from Google Voice, instead of launching the feature for all users, initially automatic captioning would only be available to a few other partner channels.
Back then the process was a lot slower, you got a warning saying that the feature was experimental and then you would have to wait sometime for the transcript to be generated. They were taking things easy. It would be months before the feature was allowed on other channels.
Still, it was a million times better than nothing, and what’s more, there was an alternative for all the other channels. Automatic timing was a new feature that allowed users to upload an existing transcript and would generate timestamps to align the text with speech. Believe it or not, this feature is still on YouTube to this day!
The final portion of the demonstration is Naomi Black, the same Naomi who had worked on building a public corpus of captions, showcasing these features from the uploader’s perspective.
As the audience applauds, the question lingers: “Will automatic captioning be able to keep up with the astronomical rate of video uploads?” It brought hope, that was for sure, but the unreliable accuracy still left something to be desired.
Join us next week, when we talk about the ingenious attempts to contain this waterfall!