History – Data Horde https://datahorde.org Join the Horde! Thu, 12 May 2022 23:05:54 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png History – Data Horde https://datahorde.org 32 32 Remembering YouTube’s Lost Unlisted Videos https://datahorde.org/remembering-youtubes-lost-unlisted-videos/ https://datahorde.org/remembering-youtubes-lost-unlisted-videos/#comments Thu, 12 May 2022 22:55:50 +0000 https://datahorde.org/?p=2799

Melinda teaches high school in the Bay Area and recently reached out to us with a problem. Her students just finished a video history project that she wanted to share with their parents and classmates. But she was concerned about posting the videos publicly because she didn’t want the whole world to find them (frankly, neither did her students). Melinda told us YouTube’s private sharing options — a 25-person cap that’s limited to other YouTube users — didn’t work for her. She needed a better option to privately share her students’ talent.

Later today, we’ll be rolling out a new choice that will help Melinda and other people like her: unlisted videos.

Jen Chen, Software Engineer at Google, https://blog.youtube/news-and-events/more-choice-for-users-unlisted-videos/

On this day, 12 years ago, YouTube introduced unlisted videos as a compromise between a public and a private video. Perfect for sharing your history project with friends, video outtakes, or just about anything you didn’t want cluttering your channel.

Some time later, a non-targetted exploit was discovered which could reveal the links of existing YouTube videos, but not the content itself. So in 2017, YouTube changed how links were generated to make links more unpredictable. It could have ended there, but it didn’t.

Years later in 2021, YouTube decided that having their links be hypothetically predictable, might be problematic for old unlisted videos. So they decided to haphazardly automatically private old unlisted videos, uploaded prior to 2017.

Users were offered an option to opt-out, if their channels were still active AND they acted within a month of the announcement. Unfortunately millions of videos were lost in the name of security. Vlogs, school projects, outtakes, patreon videos; things people wanted to share BUT they didn’t private.

Is there any silver lining to all of this? Not all is lost. There are collections like filmot which offer a non-invasive database of metadata on these unlisted videos, minus the videos themselves. There was also a project by Archive Team to archive a few TBs of unlisted videos, even if only a small sample. More than anything, YouTubers have been uploading re-uploads, in the case of inactive channels and/or significant unlisted videos.

Image

Not to sound like a beggar, but we would really appreciate it if you could share this short blog post. Almost one year later this situation has still not become common knowledge. Also be sure to check out our unlisted video countdown from last year:

]]>
https://datahorde.org/remembering-youtubes-lost-unlisted-videos/feed/ 2
Pulling Rank: The Legacy of Alexa Internet https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/ https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/#comments Fri, 29 Apr 2022 17:25:26 +0000 https://datahorde.org/?p=2772 Alexa Internet and the Internet Archive, two seemingly unrelated entities, have been partners ever since their inception. Alexa’s sunset scheduled for 1 May 2022 is, therefore, also a loss for the web archiving community. As a small send-off to Alexa, here is the story of two twins who grew apart together.


Today, the internet has become such a big part of our lives, that it’s hard to imagine a time without it. Yet only 30 years ago, the internet was hardly accessible to anyone. Not in the sense that it wasn’t affordable, rather what could be called the internet wasn’t very inter-connected. You had separate networks: ARPANET, which was heavily linked to the US’s military-industrial complex; FidoNet, which was a worldwide network connecting BBSs; USENET, which were newsgroups mostly adopted on university campuses… Each network, had a particular use-case and was often restricted to a particular demographic. It wouldn’t be until the vision of an “open web”, that a common internet would emerge.

In the early 90s, many disillusioned DARPA-contractors began leaving ARPANET on an exodus to San Francisco, synergising with the city’s pre-established tech eco system. Maybe it was the advent of new protocols such as Gopher and the World Wide Web. Perhaps it was the growing Free Software Movement. Not to mention gravitation towards the technology clusters of Silicon Valley or the Homebrew Computer Club. It was more than happenstance that California, and the San Francisco Bay Area had become home to a lot of network engineering experts.

The tricky question wasn’t how to get the internet to more people, it was how to do it the fastest. Many small companies, startups, and even NGOs popped up in San Francisco to address the different challenges of building a massive network. From building infrastructure by laying wires, to law firms for dealing with bureaucracy. Of course, there were also companies dealing with the software problems on top of hardware.

Alexa Internet Logo (1997)

One such company was Alexa Internet, founded by Bruce Gilliat and Brewster Kahle. Alexa started as a recommendation system, to help users find relevant sites without them having to manually search everything. On every page, users would get a toolbar showing them “recommended links”. You may think of these recommended webpages, like suggested videos on YouTube or songs on Spotify. Alexa was “free to download” and came with ads.

Those recommendations had to come from somewhere and Alexa wasn’t just randomised or purely user-based. Their secret was collecting snapshots of webpages through a certain crawler, named ia_archiver, more on that later. This way they were able to collect stats and metrics on webpages themselves, over time. This is how Alexa’s most well-known feature, Alexa Rank, came to be. Which sites are the most popular, in which categories and when? Over time, this emphasis on Web Analytics became Alexa’s competitive advantage.

Alexa was a successful business, only to keep growing, but founder Brewster Kahle had something of an ulterior motive. He was also in the midst of starting a non-profit organisation called the Internet Archive. ia_archiver did, in fact, stand for internetarchive_archiver. All the while Alexa was amassing this web data, it was also collecting it for long-term preservation at this up-and-coming Internet Archive. In fact, one can tell the two were interlinked ideas from the very start; as the name, Alexa, was an obvious nod to the Library of Alexandria. At one point, Alexa -not the Internet Archive- made a donation of web data to the US Library of Congress, as a bit of a publicity stunt to show the merit of what they were doing.

[For your video], there is this robot sort of going and archiving the web, which I think is somewhat interesting towards your web history. It’s a different form. You’re doing an anecdotal history. The idea is to be able to collect the source materials so that historians and scholars will be able to do a different job than you are now.

Brewster Kahle, teasing his vision for the Internet Archive in an interview by Marc Weber (Computer History Museum) from 1996. Fastforward to 31:53 into the video below.
Tim Požar and Brewster Kahle CHM Interview by Marc Weber; October 29 1996.
Mirror on Internet Archive: https://archive.org/details/youtube-u2h2LHRFbNA

For the first few years, Alexa and the IA enjoyed this dualistic nature. One side being the for-profit company and the other a charitable non-profit, both committed to taking meta-stats on the wider internet. This came to a turning point in 1999, when Amazon decided to acquire Alexa Internet (not the smart home product) for approx. US$250 million. Alexa needed growth and the IA needed funding, so it was a happy day for everyone, even if it meant that the two would no longer act as a single entity.

Kahle left the company to focus on the IA and former-partner Gilliat ended up becoming the CEO of Alexa. An arrangement was reached so that even after the acquisition, Alexa would continue donating crawled data to supply the Internet Archive. Their collaborator Tim Požar, who you might recognize from the ’96 interview from above, would remain at Alexa for some time as a backend engineer. A lot of what Požar did was ensuring that Alexa’s crawled data would continue to be rerouted to the Internet Archive. A lot of these data dumps are now visible under the IA’s Alexa crawls collection.

Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period.

Afterwards, the IA and Alexa went their separate ways. The Internet Archive expanded to non-web digital collections as well. Books, in particular. The web archive part was dubbed the Wayback Machine.

By 2001, the Internet Archive was no longer a private collection but was made open to the public for browsing. The Internet Archive really lived up to its name and became the de facto hub for archiving on the web. Ever since, the IA has continued to attract not only readers, but also contributors who keep growing the collections.


As for Alexa, Amazon’s bet paid off as they dominated web analytics for the coming years. Alexa rankings became the standard metric when comparing web traffic, for example on Wikipedia. Alexa listed some public stats free to all, but remained profitable thanks to a tiered subscription system. If you needed to know the 100 largest blog sites in a given country, Alexa was your friend. Then you could pay a few dollars extra to find out what countries were visiting your competitors the most. Alexa was great, so long as you were interested in web-sites.

Alexa was born in a very different web. A web of sites. Yet today’s web is a web of apps. Social media, streaming services… The statistics of this web of apps are kept by centralised app markets such as Google Play and Apple’s App Store. Alexa tried to adopt; for example, they changed traffic stats to be based less on crawl data across the entire web, but also on shares posted to Twitter and Reddit. Sadly these changes have not been impactful enough to save Alexa from obsoletion.

(Google Search Trend showing the rise and fall of alexa rank, alternative link.)

Amazon telegraphed their intent to either adapt or shutdown by gradually dropping features over the past few months. For example, they replaced Browse by Category with a more narrow Articles by Topic. Finally, the service closure was announced in December 2021.

So what will happen now? The closing of Alexa is different from most shutdowns because it’s not only the loss of data itself, but a data stream. Alexa was, indeed, at a time a web crawling powerhouse. Yet it’s no longer uncontested. We still have, for example, Common Crawl which also came out of Amazon, interestingly. As for the Internet Archive, they have many partners and collaborators to continue crawling the web as well, so they won’t be alone.

Alexa was also valuable in its own right. Though there are new competitors for web analytics, you won’t see many investigating global/regional popularity, or different categories. Even so, there aren’t very many services interested in overall web traffic, as opposed to site analytics. On top of this, Alexa ran for 25 years. That’s a quarter of a century of historical data on what sites rose and fell before Alexa, unavailable almost anywhere else. Almost.

Just as Alexa helped the Internet Archive grow, from this point, the Internet Archive shall reciprocate by keeping the memory of Alexa alive. Not just the sites crawled by Alexa, but also in snapshots of public statistics gathered by Alexa.

If you have an Alexa account you can also help! Users can export Alexa data by following the instructions here! You can bet any and all data would be very valuable, either on the Internet Archive or elsewhere. Please make sure you act quickly, as there isn’t much time left until May 1.

]]>
https://datahorde.org/pulling-rank-the-legacy-of-alexa-internet/feed/ 1
The Legacy Of MadV, The YouTuber Time Forgot https://datahorde.org/the-legacy-of-madv-the-youtuber-time-forgot/ https://datahorde.org/the-legacy-of-madv-the-youtuber-time-forgot/#respond Mon, 05 Apr 2021 23:07:52 +0000 https://datahorde.org/?p=2165 On 5 April 2020, MadV‘s long-dormant YouTube channel had its first upload in almost a decade. A year later, that video sits at under 4000 views. It might be hard to believe it, but that channel was once among the top 100 most subscribed on all of YouTube!

Chin up!

MadV

The name MadV might not be all that familiar to you, unless you were active on YouTube in 2006. MadV was an early YouTuber whose videos included music, illusions and collaboration videos.

Have you ever randomly gotten a video titled Re: One World to appear in your search results? How about several? These are an artifact of YouTube’s old video responses feature which allowed users to respond to videos with videos instead of plaintext comments.

Re: One World

These videos were all responses to MadV’s One World collab invitation. In this Collab, YouTubers, not in the sense of channels but YouTube folks like you and me, were invited to scribble a message, any message they wanted to share with the world, on their hand and respond to the video.

It might not seem like much today, but in its time One World went viral, drawing in thousands of responses from all around the world. In fact, for a time it held the honor of being the most responded video of all time.

A sample of some of the responses to One World, retrieved via the Waybackmachine.

Different from the collaborations of today, participation wasn’t synchronous or live, nor was it merely a series of clips edited over a cutting room floor. One World was almost like a small website or community of its own, where people could also respond to responses forming new video chains.

A few weeks afterwards, MadV compiled his own picks into a video titled The Message. Following a nomination for the most creative YouTube video of the year, and some minor press attention, MadV would continue to organize similar collaboration videos in the months to come.


So what happened that we don’t hear much about MadV anymore? Two things actually.

  1. If you were to actually look at the view counts on his channel you will notice that they are not all that high. You might also notice that The Message is still there but the original One World is missing. As it turns out, not everyone seems to have been happy with all the attention MadV was receiving. MadV had the misfortune of getting is account hacked twice, which de-linked a lot of responses and hurt the view count of the video. In a way, his account was severed off from YouTube’s collective unconscious.
  2. Response Videos did not last long on YouTube. Spam and the infamous reply girl phenomenon are often cited as reasons YouTube dropped response videos. Thereafter, any surviving response videos were de-linked and new video responses could no longer be submitted. If a response video was using the default “Re:Original Video” you might still find it lying around. While MadV also did other videos, this signaled the abolition of his niche, seemingly alienating him from the platform.

MadV and his channel represent a different era of the web. Whether it was a better or worse era, it was a time before the commodification of responses: Before checkmark verifications, before reply videos were replaced with celebrity call-out videos, before any notion of a Cringe or Cancel Culture… Being able to speak one’s mind before the crowd, had not yet become a luxury.

MadV’s legacy could have been a card trick with 3 million views, it could have been getting the YouTube staff to do a face-reveal and it could have been One World’s status as the most responded YouTube video of all time. Yet what has endured more than any of these achievements, is the memory of MadV’s projects. A generation of internet folk grew up inspired by MadV to make interesting collaborations and projects of their own. Almost like the plot to a time travel story, MadV saved the world even if it meant that he himself would be forgotten.

Hope is not yet lost. People have found new ways to respond to one another. Or just consider the plethora of Iceberg videos lately, trends might not be able to spread directly from video to video but they continue to do so from viewer to viewer. Or what about TikTok duets?

https://www.tiktok.com/@madisonbeer/video/6826515776124505350?is_copy_url=0&is_from_webapp=v3&sender_device=pc&sender_web_id=6923843985204463109

Chin up, we are all in this together after-all!

To learn more about MadV, check out his Wikitubia and Lost Media Wiki pages.

]]>
https://datahorde.org/the-legacy-of-madv-the-youtuber-time-forgot/feed/ 0
Back in a Flash: The Super Mario 63 Community! https://datahorde.org/back-in-a-flash-the-super-mario-63-community/ https://datahorde.org/back-in-a-flash-the-super-mario-63-community/#comments Fri, 04 Dec 2020 22:38:11 +0000 https://datahorde.org/?p=1774 Mario games are fun and often well-designed, that’s a given. But have you ever wanted to design your own Mario levels? Then chances are you’ve heard of Super Mario 63. Long before Mario Maker was available, SM63 was a unique 2D Mario flash game, incorporating elements primarily from Super Mario 64 and Super Mario Sunshine, perhaps best remembered for its fleshed out level editor.

It currently boasts over 7 million views and 10 thousand favorites on Newgrounds, has been mirrored on hundreds of sites and has several thousand user generated levels. Suffice to say, the game has had a lasting impact on a lot of people.

As you may know, support for Flash Player comes to an end this December. But the Super Mario 63 community has taken the necessary steps to survive the end of Flash Player. So in honor of Flashcember, here’s a brief history of what SM63 is, was and will be in the near future…

Do you still use Flash Player? Data Horde is conducting a survey to see how frequently people continue to use Flash Player even at the very end of its lifespan. It would mean a lot to us if you could spare 5-10 minutes to complete a very short survey.

Humble Beginnings

https://www.facebook.com/runouwwebsite/posts/10153403522580870

Believe it or not, the inspiration for Super Mario 63 was a fan-made spritesheet of all things. Sprite artist Flare, had edited Mario sprites ripped from Mario and Luigi Superstar Saga to give Mario his water pack F.L.U.D.D., in the style of Super Mario Sunshine. Intrigued, Runouw decided to make a 2D Mario Sunshine of his own. But who was Runouw?

Runouw is not the username of a person, but in fact a team! Twins Robert and Steven Hewitt to be exact. For over a decade, the duo have used the name Runouw to upload games, videos and sprite-art to various websites. Generally for their games, Robert oversaw programming and Steven the art design.

So Runouw got to work to make their own 2D Mario Sunshine and they debuted their first demo titled Super Mario Sunshine 128 in November 2006. Although the inspiration in Super Mario Sunshine was still very much there, Runouw had decided to incorporate mechanics and assets from other Mario games as well. Right from the get-go, the game featured levels from Super Mario 64 and spin-attacks a la Super Mario Galaxy.

In the earliest known version of Super Mario Sunshine 128, an experimental “Wiimote” control scheme is also available in addition to the more familiar keyboard controls and seems to have been designed for Wiicade. The Wiimote controls allow Mario to be controlled with the mouse only.
(Pictured: With Wiimote controls on Mario will always try to follow the cursor, the shine seen in the lower-center of the image, by dipping the mouse while Mario is in the air a dive can be executed).

Over the course of the next 3 years, this game would evolve into the SM63 we all know and love today. Updates that followed introduced new levels, power-ups and of course, the beloved level-editor. The name was changed to Super Mario 63 in 2008 and you might have unknowingly also played earlier versions of the game. A thorough version history is available on Runouw wiki for any readers who want to compare the gameplay across different versions.


Super Mario 63 Classic

The most popular, and likely most familiar, version of SM63 (aka SM63 1.4) was first released on SheezyArt on June 26th 2009, followed by the Newgrounds version one day after. You have your basic premise: Bowser kidnaps Princess Peach and it’s up to Mario to save her. Instead of Stars, you’ll be collecting Shine Sprites like Super Mario Sunshine. Couple this with perfect controls, superb level design and some very creative rehashing and you have a game which is already a ten. But it was the Level Designer which really cranked it up to 11.

Edge of the Mushroom Kingdom, a challenging final level Runouw made in response to people complaining about the main story being too easy.
Gameplay by Landy25N

The level editor allowed players to make their own levels by combining, and placing items Runouw had already programmed. Tiles, enemies, sling stars etc. While not everything in game was available in the editor (such as the lack of event-triggers), it probably had 95% coverage. You could reposition, replicate or repurpose anything you saw in the game with zero programming knowledge! And even better, you could share your levels on a portal where people could rate or comment.

Ingeniously, the level editor was very well-tied into the main game! Besides the Shine Sprites, SM63 had a second collectable: Star Coins. Unlike Shine Sprites which were required to progress the story, the Star Coins were off the beaten path and were needed for unlocking new features. Most notably, unlocking Luigi and new tilesets in the level editor. If you saw lava, which was unavailable by default, in someone else’s custom level, you had to go back to the main game and hunt down some tricky star coins to be able to unlock it for yourself. Or likewise, if you blindly played through the main story while ignoring the level editor, you would constantly be notified whenever you unlocked a new tileset, encouraging you to try it out.

The Ferris Wheel from Yoshi16’s Amusement Park, one of the highest rated SM63 custom levels of all time.

It was really after this version of SM63 was released that the forums on runouw.com came to life, because people needed to register if they wanted to be able publish their levels on the portal. Although it had been used to share levels directly (via save/load codes) and talk about development prior to the 1.4 release, with the game’s popularity Runouw’s audience grew quite a bit. Before they knew it, the forums were frequently having level design and art contests.


Interim

Following the astounding success of SM63, Runouw was determined to keep making more games. While not all of these projects were successful (notably a canned Super Smash Bros. engine and Star Fox engine), they seem to have sought out a style of their own. Only a few months after SM63, came GT & the Evil Factory, a real-time RPG similar to Megaman Battle Network, with entirely original (albeit simplistic) character designs.

Runouw’s legacy is, funny enough, called Last Legacy. First released in 2013, LL took a lot of influence from Zelda II and was a 2D action RPG with some interesting mechanics. Almost as a call-back to the SM63 days, the player has the ability to terraform tiles using their mana. LL (and Null Space) also featured their own level editors, although neither were as popular as the SM63 editor. A third chapter to LL has been in development for a few years now, but it’s unlikely that it will ever be released seeing as Runouw seems to have lost interest.

Between GT and LL, Super Mario 63 received a final update, sometimes referred to as 1.5 or the 2012 version. But more commonly this final version is taken to be the canonical Super Mario 63 and the 2009 version is referred to as SM63 Classic.

Thwomp Dungeon: Emerald Trials by ~Yuri, a level showcasing the Thwomps added in the 1.5 level editor. Click the Play Level button on the right side of the post to instantly jump in without having to load in the level-share code.

The 2012 version also introduced some changes to the level portal, which migrated ratings/comments to the forum. The 2009 portal was dubbed the classic version and the archive sports an astounding 45,000 levels, a few times more than the modern SM63 portal. That being said, the modern portal also has its advantages, such as being able to jump right into levels from the forum without having to copy lengthy level-sharing codes. Finally, Runouw made an .exe version of the game also available, freeing SM63 from the clutches of Adobe, at the cost of no longer being cross-platform.

From then on, Runouw wasn’t actively involved in the development of SM63 any further, having relegated the role to the forum community who kept organizing events all the while. An unfortunate event was when Nintendo, who hadn’t taken any issue with the game in its heyday, decided to issue a Cease & Desist on SM63 in 2013. This resulted in the Newgrounds version of the game being taken down and jeopardizing the runouw.com version. Couple that with the death of SheezyArt that same year and you had a recipe for disaster.

During these dark days, the player-base of the game was severely crippled and any sense of community outside of the forums was nonexistent. The saying goes that it’s darkest before the dawn, and in hindsight this C&D would prove to be a trial by fire. The retaliation of the determined community in those days will inadvertently lead to SM63 surviving the Flash Player killswitch!


The Super Mario 63 Renaissance

PixelLoaf Wiki
PixelLoaf Discord Server Logo
(Recently rebranded to Hazy Mazy Cafe)

Contrary to initial fears, Nintendo didn’t take any further action against the runouw.com version of SM63 or the forums. So for the next two years the forum community kept the fire burning. When Discord came around, they became early supporters starting a server called PixelLoaf in early 2015. Later that same year, the C&D on SM63 would expire, at least bringing back the Newgrounds version of the game.

After helping found PixelLoaf the Runouw brothers would slowly fade out of sight, presumably since they were continuing their education. From then on, PixelLoaf gradually replaced the forums, becoming the new SM63-central. Level design contests continued, and speed-running which was considerably much less popular during the forum days started to gain a lot attention, eventually splitting off into a server of its own.

SM 63 100% Speedrun in 54:51 by TheGaming100, an active community member,
currently ranked third on speedrun.com

So seeing as Runouw had ended development, where did that leave PixelLoaf? The community had been testing the limits of the level editor for years at this point, so of course the next step was modding the game.

Modding gravity and cheats, gameplay by Creyon.

There’s also a WIP project to introduce a new level editor, which does not depend on Flash.

So much for Super Mario 63! Let’s talk spiritual successors.

Super Mario 127 is a continuation of SM63 led by SuperMakerPlayer and other community members. Oh boy do the visuals and gameplay look good! It doesn’t use Flash, it’s being made in Godot! And of course people are already speed-running it:

SM127 0.6.0 100% 30:21 by April

Another continuation of SM63 is, Super Mario 63 Redux lead by @ShibaBBQ. Where SM127 is a modernization of SM63, SM63R aims to be a more of a remake from the ground up. So that means controls more akin to SM63 and other features to improve the gameplay experience without changing the core mechanics around too much.

It’s funny how Super Mario 63 started with a spritesheet, and now, years later, Super Mario 63 inspired an artist to make spritesheet of their own.

On that note, Runouw made a brief comeback recently. Seeing as the forum activity had moved to Discord they’ve frozen the forum and are now redirecting people to the server. Before vanishing off the face of the internet once more, Runouw finally uploaded the full Source Code of SM63 to GitHub. It’s safe to say SM63 couldn’t be in a more secure place than it is today.


What the future holds

The old levels might need some organizing and the search function of the level portal definitely needs fixing. But at least levels from over 10 years ago are still up and online. The forums might be dying, but the Discord server active as ever. In fact they recently rebranded themselves as Hazy Mazy Café.

Hazy Mazy Café logo in time for the Holliday Season

When January comes around, Super Mario 63 will still be playable through the .exe version. And what’s more, Flash Emulation is coming along nicely. You should expect to be able to play the game on Newgrounds or the Internet Archive with Ruffle. Bugs? Thanks to Runouw graciously sharing the full source code testers and developers will have the perfect reference pinpoint issues in their Actionscript implementations.

Not only has SM 63 outlived flash, through fan-sequels like Super Mario 127 and Super Mario 63 Redux, I’d say we have a lot more good news to hear about.

Long story short, the SM 63 community has set a great example by showing the world how to go around walls that you can’t bring down. Time will tell what the future holds, but things are looking bright!

]]>
https://datahorde.org/back-in-a-flash-the-super-mario-63-community/feed/ 10
[Obsoleted] YouTube removed community translations, but there is a workaround! https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/ https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/#comments Mon, 12 Oct 2020 20:04:36 +0000 https://datahorde.org/?p=1555 Edit: On October 28 around 8 PM (GMT) the old caption editor was shut down for good, blocking off further contributions for good. For external alternatives to community captioning, check out our Captioner’s Toolkit page:

For the past few years YouTube had been supporting community captions, a feature which allowed users to submit captions or translations for videos of other channels. On September 28 the feature was removed and the menu to access it was hidden.

However, you might still spot new videos with community contributions published after September 28. Take a look at this video uploaded on October 9. Notice that the Caption author is Dark_Kuroh, different from the video uploader.

But how, time travel? As it so happens, even with all the menus hidden, it is still possible to access the old captions editor. This method requires the uploader to know where to check, so it’s best that if you are submitting captions or translations using this method, you let the uploader know the language and the video.


How to submit community captions

So you want to caption or translate someone else’s video… Assuming that the channel still has community captions enabled, go to the following URL:

youtube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}

Example: http://youtube.com/timedtext_editor?action_mde_edit_form=1&v=vCxz2lSeer4&lang=en

where {video code} is the end of the video’s id and {language code} the abbreviation for the language you want to translate into. Fortunately, you can also later switch between languages, so if you don’t know the abbreviation you can use en to start open the editor for English and then switch to your actual language through the Switch Language button.

When you’re done, don’t forget to submit by clicking on the Submit Contributions button in the upper right corner.


How to accept community captions

Previously, you were able to view community submissions from the Community Tab on YouTube Studio. Unfortunately, these are now hidden. So you will need to have an idea of which videos and languages to check.

If you hadn’t enabled community contributions before it’s not too late! Just simply go into YouTube Studio > Videos and choose the videos you would like to enable contributions on. Go into Edit > Community Contributions and switch it on. Lastly, don’t forget to click on “Update Videos”.

You, as an uploader, can also theyoutube.com/timedtext_editor?action_mde_edit_form=1&v={video code}&lang={language code}to access the caption/translation submissions you have received. A good place to start from could be some of your most viewed videos, and you should definitely pay attention to your subscribers to see if they are trying to tell you to accept any of their submissions.

All you have to do when you do find a community submission is to click on the Publish or Publish edits button on the upper right corner,


While YouTube is still working on their permissions system and the community is banding together to find alternatives of their own, it’s important to endure through this transition period. So here’s hoping this tutorial helps you continue to add/receive translations on your videos for a little longer…

]]>
https://datahorde.org/how-to-submit-accept-community-translations-on-youtube-a-work-around/feed/ 2
The Untold Story of why YouTube is removing Community Contributions; YouTube CC History pt. 4 https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/#comments Fri, 18 Sep 2020 20:58:47 +0000 https://datahorde.org/?p=1442 Continued from Part III

Last time, we saw YouTube introduce new ways to add closed captions, as well as some UI improvements. Owing to these changes, the quantity of captioned videos increased drastically between 2009 and 2015. However, outside of accuracy improvements to automatic captioning, it cannot be said that quality was improving to match this quantity.

This lag in quality can be attributed to a problem many online platforms suffer from: Bundling closed captioning with subtitles. If you recall, YouTube had taken note of the growing internationalization of their platform and so began diverting resources accordingly. But at some point, international accessibility had became the greater priority. And this is noticeable through gradual changes in the UI, such as relabeling “automatic-captions” as “auto-generated” or naming the captions panel in YouTube Studio as “Subtitles”.

But as we will soon see, these changes were not only in name…


Although the terms captions and translated subtitles are used interchangeably, there is a slight nuance between the two. They differ in what additional information they provide. Closed captions are meant to complement the audio, and that means that in addition to a transcript of what is being said, there should also be sound cues. 

Whereas translated subtitles are meant to break language barriers, and more often include footnotes such as definitions or translations of proper names. It would not be correct to say that one is superior to the other, but they are meant for different purposes. 

Yet as one can imagine, bundling these two into a single feature would inevitably lead to an ambiguity of which form of transcription to adhere to. And that is precisely what happened on YouTube. Even if there were more and more captions by the day, these were not being authored with the needs of the Deaf Community in mind. 

In reaction to these low-quality captions, the NoMoreCraptions (sic) movement was born. The movement was headed by Michael Lockrey (aka TheDeafCaptioner) and Rikki Poynter, who came up with the name independently of one another. Yet theirs was a common goal: To spread awareness to combat low-quality captions.

If Google and YouTube aren’t going to do the right thing on accessibility, then it looks like it’s going to be down to you and me.

TheDeafCaptioner, 2014

Lockrey was among the earliest critics of YouTube’s low-quality captioning and other shortcomings in accessibility. His focus was on the inadequacy of automatically generated captions. Around the same time that YouTube was testing the closed-beta for community contributions, Lockrey launched nomorecraptions.com. A spiritual successor to CaptionTube, the web-app allows users to caption any YouTube video of their choosing (regardless of permissions) but they can only keep the caption files to themselves. A few years later, it also received a feature to auto-import automatic captions, so captioners would not have to start from scratch.

Of course, automatic captions were not the only problem. There were also the issues in manmade captioning, which became a lot more frequent once community contributions became available for all channels in late 2015. This is when Rikki Poynter comes into the picture.

The good thing is that with the announcement of the fan contribution subtitles and captions, more big YouTubers have had their viewers help them caption their videos, and that’s pretty awesome. Here’s the bad: A lot of these big YouTubers, okay a handful – so far that I’ve seen – of these big YouTubers, they don’t check over their stuff and there’s a lot of incorrect captions, not only incorrect, but there’s a lot of jokes that get put in there, and it basically looks like it’s all for laughs.

Rikki Poynter, Stop Adding Jokes and Useless Commentary Into YouTubers’ Captions!

Poynter would begin an extensive campaign to promote proper captioning and to fight against caption vandalism. Sadly much to her dismay, she was not able to make a whole lot of noise. It was rare to see any of these big channels taking responsibility for these low quality, sometimes even malicious, captions. 

You see, although community contributions were designed to have an independent contribution and review phase, all it took was a few alt-accounts for a person to force their captions to be accepted. It was an Achilles’ heel, and it would come back to bite YouTube in the near future…


2017 was a turning point for YouTube. At this point, over one billion videos had been automatically captioned! Approximately 15 million views came from automatic captions per day! Confident in their technology, YouTube slowly also began to make the automatic translation feature increasingly more prominent.

Whether or not if it was due to a vision of eventually fully automatizing captions and subtitles, around June they shut down their professional translation network. As if to fill the void, YouTube all of a sudden began actively promoting community contributions which they had been updating silently up until this point, through video tutorials:

(Click here if the above video is inaccessible)

Now that people were actually becoming aware of this feature, increasingly more channels began enabling community contributions. Consequently, as viewers caught on, community contributions would flourish. All of a sudden, it became possible for nearly any channel to be able to expand into an international audience once it reached a large enough following.

Alas, all the flaws and exploits which were present in 2016, were still there and would continue to remain there for some time to come. And over time, spammy and outright malicious captions grew increasingly in prominence. While Poynter and other critics kept trying to raise awareness on the issue, nobody was giving them the time of day. It was going to take a blunder so audacious and so mind-boggling, that people could not miss it.


On August 20, 2019, 3kliksphilip uploaded a video titled The Much Problem Of Translate on his kliksphilip side-channel, where he announces his decision to disable community contributions. He laments that he is tired of having to back-translate these so-called contributions to filter swear-words, self-promotions, or altered links that people were slipping in. A big YouTuber was finally bringing attention to a years’ long problem, but this was not the straw that broke the camel’s back, that would come only a few hours later on August 21.

So I was on the French version of YouTube, and I decided to click on a PewDiePie video, because who doesn’t want to watch a PewDiePie video, right? And this is what happens when you go on a PewDiePie video in a different language, that’s not English. 

So it looks perfectly fine before you do click on it, you know the title says “We BUILT the GREATEST thing in Minecraft”. But when you click on the video the title updates and says “SUBSCRIBE TO CHANNEL AT THE DESCRIPTION”. But then at the very top of the description, there’s a link which takes you to a completely different channel which says subscribe next to it, and it’s not PewDiePie’s, it has nothing to do with PewDiePie it’s a completely random YouTube channel.

JT, Pewdiepie’s Translators ruin his channel…

YouTuber JT would, unlike 3kliksphilips, actually call-out a caption saboteur who had done a lazy job covering his tracks. This however only led to an escalation of hostilities. Before anyone knew it Twitter and Reddit were rioting. PewDiePie himself wound end up disabling his community contributions and explaining why in a video, ironically, titled Best Week Ever, currently standing at over 18 million views!

YouTube finally took notice and at first tried to downplay their responsibility:

Failing to calm the crowd, YouTube then took a radical decision, which they never repealed, to make it so that any and all community contributions had to be verified by the video uploader. Any spam/misconduct sneaking into their captions would now be their sole responsibility:

As a result of these events, it became much more difficult for contributions to ever be published. Many submissions are currently stuck in limbo, waiting indefinitely for uploaders to publish them.
In the end, the toiling translators were demonized and the Deaf Community were still not getting properly captioned videos…


Was that the end of it? You wish! The team responsible for captioning in 2019 might have had some familiar faces, but their attitude was nothing like the team in 2009. Gone was the image of pioneers who were working tirelessly to bring accessibility to the internet who truly cared about what they were doing. All that remained was a façade, buried somewhere under the behemoth of all those robotic subtitles. 

Now, it would absolutely not be fair to say that the captions team was acting irresponsibly on their own accord, but rather under the wishes of their company at large. When people were ranting on Twitter, they were ranting to YouTube, not any of those amazing people. And it would seem that YouTube, as a company, chose to intervene here.

The decision to only allow uploaders to publish contributions taken by YouTube was an irrational one, hastily decided in the heat of the moment with little regard for consequences which would have mattered to a team whose priority was providing accessibility and not garnering reputation.

It was not long before people were asking for alternatives such as contribution moderators. But YouTube had other plans in mind… On April 23, 2020 (the anniversary of the first video on YouTube no less), The Creator Insider channel wanted to have a talk about The Future of Community Captions? They explain that they are considering completely removing community captions, seeing as content creators are unhappy with the feature, and seeing that contributions are not used “that much”.

Highly unusual for a video about accessibility features, where Google often boasts about their inclusivity, the only person featured in this video responsible for closed captions is Product Manager James Dillard. And that’s far from the only thing unusual about this announcement, Ashlee Boyer wrote an entire Twitter thread counting the oddities and unfortunate implications throughout the video.

Of course, since this was only a Creator Insider video, it didn’t get that big of an audience, only being noted by Deaf content creators as a possible red flag. Unchallenged, the decision was officialized by the end of July. The scheduled removal of Community Captions on September 28, was reported through the YouTube Help page, YouTube Support Community, and finally a now-unlisted Creator Insider video which provides the most detailed explanation for the course of action.

First of all, we get to see a glimpse of a change in the hierarchy concerning the captions team. It’s now a branch under the Creator Product team and we have the lead on that, Ariel Bardin to accompany the captions product manager James Dillard from the previous video. While a lot of points are demystified in this video, such as how this removal decision came in the wake of updates to the YouTube Studio and how YouTube would work to roll out a “role” system for people in need of captioning*, ultimately it misses the mark. The reasons provided here do not sufficiently justify the removal of this feature and drew much ire from the few people who did manage to find the link to the video.

* Minor note here, in YouTube’s realization of the idea, there are roles which allow for specific people to be assigned permission to contribute captions at all, instead of moderators in charge of approval.


Public outcry ensued soon after, with many YouTubers criticizing the decision. But as it has always been with captions, it’s been difficult to win the numbers game, as while these are moderately big channels they are not going to bring in 18 million views. There is currently a change.org petition with over 500 thousand signatures. #DontRemoveYoutubeCCs eventually trended on Twitter at one point. Even a number of Deaf charities have pleaded with YouTube to reconsider. 

In hopes of appeasing those unhappy with the decision, YouTube has offered eligible channels a 6 month, later 1 year, Amara subscription. They also haphazardly began rolling out the new “roles” system which not only does not support closed captioning yet, but was missing closed captions on the announcement video for the first few days.

No matter how hard they might try to win people over, the decision to do away with community contributions altogether instead of attending to the issues which have made the feature exploitable is nothing short of betrayal. Not only a betrayal of users and content creators on YouTube’s platform but even Google’s own employees, some of whom waited almost a decade for community contributions to become a feature


Even though we have reached the present day, the history of closed captions is not over. It is merely turning over a new page. Where YouTube has fallen short, projects such as externally hosting community contributions have come about:

As for us here at Data Horde, we are working to collect as many of those contributions stuck in limbo, never to be accepted by the video uploader, as we can. In addition, we will also be grabbing caption credits, as YouTube made no comment on what was to come of these.

If you would be interested in this project head on over to the Discord Server bellow:

https://discord.gg/nacFvU6

And finally, to all of the video uploaders out there, please take your time to accept any unpublished non-spam community captions on your channel while you still can. This is not only important because it feels like the right thing to do, but also because caption formatting and stylization are maintained only when an uploader accepts a contribution.

(Note that the editor preview might lie by rendering the captions differently. The English captions in the video below are an example of how this will work if a formatted contribution is accepted correctly.)

Make sure to also follow Data Horde in the coming days, as we bring more updates on the current situation and share resources to help content creators and viewers who will be harmed by the removal of community contributions…

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-iv-downfall/feed/ 1
What Was It Doing There? Quick Anecdotes Of Games We Found In The Weirdest Locations https://datahorde.org/what-was-it-doing-there-quick-anecdotes-of-games-we-found-in-the-weirdest-locations/ https://datahorde.org/what-was-it-doing-there-quick-anecdotes-of-games-we-found-in-the-weirdest-locations/#respond Tue, 15 Sep 2020 18:56:30 +0000 https://datahorde.org/?p=1332 Originally published on: https://medium.com/bluemaximas-flashpoint/what-was-it-doing-there-5d471188c823

Imagine never being able to play the one game you enjoyed so much as a kid. The thought of a beloved game being lost would scare most gamers: Mario Kart gone without a trace, Sonic and Knuckles thought to be a fever dream. However impossible this may seem, for many this has become a scary reality. In the world of web games, the internet is an ever-changing place. A game can be played one minute and gone the next, leaving only mere mentions of it from a few people, stranded in an ancient forum, hosted on a potato.

Because web games can be lost so easily, there are a lot of people searching for their web game white whale. Enter the Hunters of Flashpoint. We aim to find what others consider lost: from Postopia breakfast bonanzas to games that are generally considered to carry curses. Hunters find the weirdest nonsense you thought was a fever dream. See, web games are fairly unique, in the sense that they can easily be put on multiple sites with little to no effort. As a result, many web games were hosted on thousands of different sites, plenty of which are carbon copies of each other. Such an effect is a double-edged sword. Flash developers have had their works stolen time and time again, which is bad for business. However, when that original site goes down because the creator decided that it just didn’t live up to their expectations, the game runs the risk of being lost to time. Luckily, some Iranian fellow has your back seeing as he stole the game 3 days after it went up.

With the nature of our work, some of the games we have found have some interesting stories behind them. One Hunter who goes by the online username of ‘Steviewonder’ curated the 28 Weeks Later movie tie-in game, not realizing we thought it lost. Another of Stevie’s stories happened during a time he was scouring open Dropbox accounts, where he found a lost game hidden in the files of a random person from another country. A version of the King.com Luxor game was found on a Chinese site, miraculously working with all assets. I myself am a fairly new face to the project, but even I have my stories. I was able to find South Park: Big Wheel Death Rally (screenshot in the thumbnail) on a site called joflash.hu, which is a Hungarian site that for some strange (but lucky!) reason was not using the broken embed every other site had used. The owner of joflash.hu had instead taken the .DCR and all its assets and ported them to their site, bypassing the problem everyone else who stole the game had when the embed died. Much more recently though for me was a game by the name of Pebbles Popstar, one of the lost Postopia games. This game turned up on a site that I won’t directly mention due to the fact it hosts copyrighted material. (I will say though it’s a Polish filesharing site themed around hamsters, which should be all the info you need.)

However, odd locations on the internet isn’t the only place lost games have turned up. Computerdude77 asked an unusual question to the Flashpoint staff one day: could games be extracted from an old Internet Explorer web-cache? Turns out, after some effort, yes, yes we could. Hearing this information, Computerdude77 took it upon himself to search through the web cache of his grandmother’s computer in hopes of finding some games he played as a kid. Boy, did we get lucky. Many a lost game was found, and a few games that were already in Flashpoint but incomplete were finished thanks to Computerdude77’s work.

Not all hunting goes smoothly though, and there have been more than a few blunders. One time I spent 3 days working on tracking down a game from the Flashpoint Lost Games list. Everywhere I looked for this game it was missing some assets or it hadn’t been copied properly. After 3 days I finally found a copy of it. I was so excited. I posted my findings in the Hunter Lounge, only to be told that the game had been found months ago and curated with a different name. Nobody had marked in on the sheet as found, so I had spent days looking for a game that was already rescued. But hey, those things happen when volunteering one’s own time to save history. These examples are why we need more folks to come together to help save the web games of our collective childhoods. Adobe ends support for Flash in 2020, and when that happens so much more will be lost. It is a race against time, and we need everyone.

If you would like to find out more about this strange and fantastic project, just check out the community spotlight for more info, or if you’re already sold head on over to the website or Discord Server!

]]>
https://datahorde.org/what-was-it-doing-there-quick-anecdotes-of-games-we-found-in-the-weirdest-locations/feed/ 0
Scaling the Waterfall, Captions for All; YouTube CC History pt.3 https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/#respond Sat, 12 Sep 2020 19:09:55 +0000 https://datahorde.org/?p=1399 [Thumbnail taken from Niagara Falls, Canada]

Continued from Part II

When YouTube’s automatic-captioning was first demonstrated, it was only available for a select few channels. Around March of 2010, YouTube decided that they were ready and made automatic-captioning available to all channels! This also meant that automatic captions could now be translated as well.

The feature was introduced to provide closed captioning for videos that were not already captioned, but the question is did it get the job done? The reception was mixed. No matter how much this innovation was appreciated, the frequent errors were noticeable. Furthermore, it was only available in English, so only for a fraction of the videos on YouTube.

All the while, the rapid flow of videos raged on… At the time it was estimated that 20 hours worth of video were being uploaded to YouTube by the minute! That rate has only grown since. Not to mention that this is only accounting for YouTube, what about the rest of the internet?

In a CNN interview Ken Harrenstien, chief engineer of the captions team, once remarked:

What I would like to see happen is that people would see what we are doing here, and realize, “Oh, this is really useful,” and all start doing the same thing 

Whether he knew it or not, Harrenstien’s dream was not so far off…


An important event later into 2010 was the US Congress passing the Century Communications and Video Accessibility Act (CCVA). The act which aimed to improve accessibility to technology, introduced regulations such as requiring video programming that is closed captioned on TV to be closed captioned when distributed on the internet. This meant that news and entertainment media that were transitioning from television to online streaming would have to bring their readily available closed captions along. So not only was YouTube going to get more caption uploads, but other websites now had to accommodate these changes by adding or improving their closed captioning interface.

The act also mandated video devices be designed with specifications to support closed captioning. In light of this, YouTube would bring closed captions to mobile in 2010, and introduce support for additional subtitling. In addition, they also introduced support for captioning formats used throughout the industry in 2011. Previously, positioning and stylization had required the use of YouTube’s native annotations system, but thanks to this change video distributors now had a way of uploading their own stylized captions.

The worth of stylization is best showcased through video. CPC Closed Captioning placement and formatting by 18hands

However, beyond the need for disability accessibility, there was an even bigger need for international accessibility. During Google I/O 2011, a gadget that allowed for live broadcasting of transcripts was showcased to demonstrate the capabilities of the Captions API. To be clear, these were not automatically generated captions but professional transcripts being written in real-time. That being said, using automatic translation, viewers from all around the world were able to translate these captions into their own languages. And in fact, English captions accounted for only 27.33% of caption viewership, the remaining 72.66% of caption viewers were in fact translating!

The top captioning languages during Google I/O 2011 were English, Spanish, Portuguese, French and Russian.

While all of these developments helped bring existing captions to more people, the question of how to bring high-quality captions for new videos, or videos specifically produced for the internet, still remained. To this end YouTube would pursue three approaches:

  1. Improving auto-captioning
  2. Incentivizing professional captioning/translation
  3. Providing channels with the ability to crowdsource their captions

Let’s talk a bit more about each of these in further detail…


Improving auto-captioning

If Automatic Captioning was to be able to provide captions on any video, it had to work for more than only English. The second language to offer auto-captioning was Japanese in 2011. Korean and Spanish support arrived in 2012; with German, Italian, French, Portuguese, Russian, and Dutch also receiving automatic captions later into the year.

You now have around 200 million videos with automatic and human-created captions on YouTube, and we continue to add more each day to make YouTube accessible for all.

Hoang Nguyen, Software Engineer on the Captions Team, via https://blog.youtube/news-and-events/youtube-automatic-captions-now-in-six/

It didn’t mean a whole lot to be able to automatically caption 10 languages unless you could do it right. So, as speech recognition technology improved, Google and YouTube upgraded their speech recognition algorithms. Two major upgrades were when the speech recognition technique was switched to a Deep Neural Network model and later to a more specialized LSTM-RNN model, in 2012 and 2015, respectively.

During one of the most iconic moments in Google’s history at Google I/O 2015, CEO Sundar Pichai announced that they had lowered their word error rate in speech recognition down to 8%!.

Things were getting a lot better, but was it enough? Some criticized the accuracy metrics as easily manipulable and automatic captions still were not an alternative for manmade captioning. So, YouTube began looking into manual captioning alternatives.


Incentivizing professional captioning/translation

A harsh reality of online captioning in the earlier years of the world wide web was that it was often distributed under-the-counter, in forms that were generally associated with piracy. Much like with today’s streaming services like Netflix, subtitles for movies or shows were often distributed regionally and it was common for subtitles in a particular language to never show up in most regions. Not to mention, a lot of this media never even received subtitles or captioning at all, giving rise to a rich culture of amateur translations and fansubs. The scars of this era are visible to this day, for example, .srt, which is now an industry-standard subtitle format, originated on a program literally called SubRip

But what about media which was being produced for the internet? Predictably it wasn’t long before fansubbing or fan-captioning websites showed up for videos, such as Overstream. Still, these websites could not shake off the negative connotation surrounding them. As for professional subtitling and captioning, which was a fledgling industry when it came to more traditional media, it was struggling to go online. Captioning vendors were out there, but unable to reach the market.

This paradigm was challenged in 2010, with the founding of a platform called Universal Subtitles, today known as Amara

Amara’s early user interface

Here’s the problem: web video is beginning to rival television, but there isn’t a good open resource for subtitling.

Here’s our mission: we’re trying to make captioning, subtitling, and translating video publicly accessible in a way that’s free and open, just like the Web.

Amara Blog https://blog.amara.org/2010/04/13/subtitles-and-captions-for-every-video-on-the-web/

Right off the bat, Amara made it clear that they wanted to become a champion of online accessibility, through collaborative captioning and subtitling. And unlike YouTube’s Closed Captioning team, Amara was determined not to be bound to a single website.

YouTube did take notice; it is very interesting to note that only a few weeks after Amara’s alpha-launch, YouTube partnered with the DCMP to endorse professional closed caption vendors as “YouTube Ready”.

You may be able to manage creating captions for your videos on your own, but sometimes you have too many videos or your video has elements that need special care. Today, thanks to support from the Described and Captioned Media Program (DCMP.org), we’re pleased to roll out a new “YouTube Ready” designation for professional caption vendors in the United States. The YouTube Ready logo identifies qualified vendors who can help you caption your YouTube videos.

Naomi Black, Caption Evangelist, via YouTube Blog

By 2013, YouTube had decided to launch their own translation network. Uploaders could now request translations on their videos, directly from YouTube’s UI and would be redirected to partner vendors.

Although short-lived, the network showed YouTube’s determination to encourage professional translation and captioning, instead of amateurish captioning. While this might have been an ideal solution for big-budget content creators, it wasn’t viable as a general solution.


Providing channels with the ability to crowdsource their captions

YouTube did already have crowdsourcing, before Amara, albeit not natively. There was CaptionTube, and even the older YouTube Subtitler later received some sharing features. The catch was, none of these community captions would show up on the uploader’s channel, unless the captioner(s) got into contact with the uploader.

To make things easier, in 2012 YouTube introduced link-sharing, which would allow the uploader to share a link with captioners for them to be able to directly upload captions to their account. But it was still a tedious process.

Meanwhile, Amara had expanded their frontiers and was now offering “translation workspaces” for organizations such as TED with Amara Enterprise. This gave websites the ability to “moderate” community contribution, instead of having it all in the open.

And as Amara kept teasing YouTube about their error-prone automatic captions, it became clear that YouTube needed to do something. Their answer is what we know today as “community contributions”.

Reminiscent to the automatic captioning launch, YouTube would take things slowly this time round. Starting in 2014 they would silently launch community contributions for Google and YouTube’s own channels. Gradually, channels like Crash Course, Barely Political, Kurzgesagt would be among the first to enable community contributions. It’s also worth noting the similarities to Amara Enterprise, in that YouTube’s community contributions also operate on an independent transcription and review phase.

Community contributions for all channels would eventually go live in late 2015, to little fanfare. It was strange to see YouTube not make an announcement, seeing that you needed to enable the feature from your settings. Nonetheless, this form of captioning also had its merits, and it has come to be appreciated as more and more channels discovered that YouTube had such a feature…


Through these different captioning methods, YouTube amazingly did manage to scale the waterfall, to some extent. In 2015, YouTube’s product manager Matthew Glotzbach reported that around 25% of all videos on the site had been captioned in one form or another.

So was the golden age of closed captioning on YouTube… But was it to last?

Join us next week to find out!

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-iii-scaling-the-waterfall/feed/ 0
What’s in a Flash game? (What are .SWF files?) https://datahorde.org/whats-in-a-flash-game-what-are-swf-files/ https://datahorde.org/whats-in-a-flash-game-what-are-swf-files/#respond Mon, 07 Sep 2020 15:00:13 +0000 https://datahorde.org/?p=1210 Many of you 2000s kids probably grew up playing Flash games on your web browser as a kid. But have you ever wondered how exactly a Flash game works?


The simplest of Flash games is composed of one file ending with the .swf extension (Small Web Format, previously called ShockWaveFlash). That file contains all of the code and assets (such as artwork, audio, fonts, etc.) and it’s playable using a plugin called Macromedia Flash Player, which was later renamed Adobe Flash Player after it got purchased by Adobe. The coding language that is used is ActionScript, and graphics are usually vector, which is different from your usual image.

Instead of having an image made out of pixels, vector graphics will instead have information on the lines that form the image. That means you can scale a vector and never have to worry about the image becoming blurry because instead of scaling an image made out of pixels, you are scaling an image made out of lines. This will also in some cases reduce the file size.


Speaking of SWF, there are many other uses for Flash content. You have Flash movies, which are animated cartoons, and applets, which are small applications that are integrated in a webpage. Even online video players once used Flash!

Discontinuation of Adobe Flash Player

Flash Player was once the most used and the most needed plugin for your web browser. You couldn’t browse the web properly without Flash Player because tons of things needed it. In recent years though, Flash has declined in usage and Adobe is planning on killing the plugin at the end of 2020.

]]>
https://datahorde.org/whats-in-a-flash-game-what-are-swf-files/feed/ 0
Pioneering Online Accessibility; YouTube CC History pt.2 https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/#respond Fri, 04 Sep 2020 16:55:18 +0000 https://datahorde.org/?p=1346 Continued from Part I

When we last left off, the stage was set! YouTube had gathered a group of amazing talent to not only bring closed captions to YouTube but to make it monumental!

You had top UI engineers, speech recognition veterans and the hard-boiled closed caption team of Google Video. Above all, the captions team was mostly comprised of people who were deaf, hearing impaired, or who had a loved who was deaf or hard of hearing. You could be sure they were going to give it their all.


The team immediately got to work and their efforts bore their first fruit in the late August of 2008. 

You can add captions to one of your videos by uploading a closed caption file using the “Captions and Subtitles” menu on the editing page. To add several captions to a video, simply upload multiple files. If you want to include foreign subtitles in multiple languages, upload a separate file for each language. There are over 120 languages to choose from and you can add any title you want for each caption. If a video includes captions, you can activate them by clicking the menu button located on the bottom right of the video player. Clicking this button will also allow viewers to choose which captions they want to see.

The YouTube Team, via the official YouTube Blog https://youtube.googleblog.com/2008/08/new-captions-feature-for-videos.html

Some of the first channels to feature closed captions were the channels of BBC Worldwide, CNET, UC Berkeley, MIT and Gonzodoga.

CAPTIONS AND SUBTITLES TEST by TheDawgProductions is unique in that it is quite possibly the first video made where an uploader was consciously available of the feature, unlike the above examples which were captioned after upload. Amazingly, it predates the blog announcement for CC.

Despite the YouTube Blog being nothing short of a beast with over 2 million subscribers at the time, feeling that the announcement wasn’t loud enough, they decided to also do a video announcement on the official YouTube channel: 

An interesting part of this video announcement, in addition to the rickroll around 0:34 into the video, is the mention that captions and subtitles are also helpful for people who speak other languages. This was nothing new since you could add caption tracks in multiple tracks even back in the Google Video days, but the way this remark is juxtaposed into the video suggests that they are teasing at a new feature.

With the announcement of machine translation in November, we got to see just what that feature was! Viewers could now translate closed captions into whatever language they chose, via Google Translate. The feature has changed a bit over the years, but in its earliest form you could translate any captions which the user had uploaded into any of the languages available on Google Translate: 

Demonstration of Captions and Translation by captionmic

It’s worth noting, however, that these translations were not permanent, they were designed to be dynamic as Google Translate kept improving over time. If the uploader wanted to ensure viewers see a translation in a particular language they would still have to add that in themselves.

On top of this, just a few days later, closed caption support was added to embedded videos, so you no longer had to be on YouTube to view closed captions. The captions team was on fire!


After so many updates in quick succession, the captions team would fall silent for a few months, not because they were exhausted, but because they had got to work on something big.

Nonetheless, the first half of 2009 saw some interesting in-house projects utilizing captions. This was fitting since after so many updates improving the viewing experience on YouTube, it was now about time that some work was done to improve the process of content creation. The first of these projects was CaptionTube, a richer editor than YouTube’s built-in caption editor, in 2009 anyway:

CaptionTube also had another interesting feature, reminiscent of caption contribution. Unlike YouTube where you need to have the permission of the uploader to be able to caption a given video, you could caption any video of your choosing. Even if the uploader didn’t want closed captions on their video, you could keep a copy of it for yourself. And if the uploader did want captions, you could just export your captions and email it to them.

I had features for converting various subtitle formats, uploading and previewing them, setting the language, alongside a YouTube video in a UI that looked like Premiere or Final Cut. I wrote CaptionTube and the Python API client library myself, but I had nothing to do with the internal caption infrastructure.

I think roughly 1.5 million videos were captioned with it (a drop in the bucket). I supported the service for ~3 years until YouTube had better internal support, I don’t know how big that team was. After that, I turned it down as they had added the collaboration features. I had thought about adding crowd-sourcing to CaptionTube, but I didn’t have the time and the internal caption team was working on it.

John Skidgel, creator of CaptionTube, personal communication via email. 

Another project was the aptly named google-video-captions, meant to be a dataset of transcripts for videos on Google’s own YouTube channels.

The google-video-captions project has two goals:
* To provide a public corpus of Creative Commons licensed captions that were transcribed from Google videos.
* To enable community-based translation of Creative Commons licensed caption files for these same videos.

Naomi Black, creator and maintainer of google-video-captions, project description

This project was led by Naomi Black, who at the time was managing YouTube and Google’s video channels. Unfortunately, this project died a lot sooner than CaptionTube, as updates ceased around August and eventually Google Code, where the project was maintained, went defunct altogether. 

Yours truly has taken the liberty of exporting what remains of this project to GitHub, in hopes that someday the idea might be revived. With improvements to the YouTube API over the years, the task of transcript retrieval should now be a whole lot easier.


All the while, Google continued to make strides in speech recognition. In March, Voicemail transcription debuted for Google Voice:

It was received to mixed reception, to say the least. While admittedly innovative, the accuracy wasn’t all that good. Secondly, the processing of people’s speech raised privacy concerns, as indicated by comments left on the video and elsewhere. Google was not going to be as hasty the next time they unveiled a major speech-to-text product.

hey keith this is matt mail drink trying out the anti pants go go voice i translator away this is making into an S M S this is gonna be too long later sent you a transcript on now this would be normal so i wonder how accurate this thing will be in translating when i have to say and today i had a salad and okay double cheeseburger and what else i am comforting over some brainstorm notes and when she say something that michael birthday has if you extend the well and yeah i goes hello so the translate laughing to okay i think that’s about

Google Voice – Transcript Test by Spudart and Sparx

After months of working silently, the captions team unleashed their pièce de résistance: Automatic Captioning! Having learned their lesson from Google Voice, this feature was initially exclusive to a select few channels, primarily educational ones.

A day later, they would give a thorough demonstration of this, and another feature called Automatic Timing, to an audience in their office in Washington D.C. Members in the audience included accessibility leaders from the NAD, Gallaudet University, the AAPD and even Marc Okrand. This is one of those videos on YouTube, which you can’t help but wonder why it hasn’t reached a million views yet! If you’ve got an hour to spare, it is a must-watch!

After a brief overview by Jonas Klink, then accessibility product manager at Google, we cut to Vint Cerf who delivers the Introduction. He opens with how “to organize the world’s information and make it accessible and useful” entails accessibility for the deaf, hearing-impaired, visually, or motor-impaired.

So I want to tell you, first of all, why accessibility’s personally important to me. Sigrid, who is in the audience over there — wave your hand, Sigrid — and I are both hearing-impaired. Sigrid was totally deaf for 50 years. She now has two cochlear implants. And they work wonderfully well. They work so well, we had to buy a bigger house, because she wanted bigger parties, because she could hear. So this is a technology which is spectacular. I’ve been wearing hearing aids since I was 13. You can do the math. That’s 53 years.

So both of us care a great deal about how technology can help people with various impairments get access to information and be connected with the rest of the world. So quite apart from my job at Google, I have great personal interest in what we’re talking about today.

Vint Cerf, Announcement on Accessibility and Innovation, 3:41 

At one point he makes a slip-up, stating that YouTube introduced captions in 2006 when it was actually Google Videos which introduced closed captions. Next, he hands the microphone to Ken Harrenstien, who you might recall was the chief engineer on closed captions during the Google Video days, after talking about their history together.

Ken Harrenstien, who had been waiting for this moment for at least three years, continues with a showcase of caption features that have been added to YouTube over the past year: settings to adjust the size of captions, to turn the background off, etc. But at the end of this section, his optimism up until this point begins to fade as he addresses the sheer amount of uncaptioned videos.

To provide a visualization, he takes out a labeled bottle of water and tells the audience to assume this bottle represents all the videos that are currently captioned. Then he opens a clip of Niagara Falls from YouTube and tells the audience this represents all of the videos being uploaded to YouTube.

This is our problem. Remember what Vint said earlier? Every minute we stand here and talk, people are uploading 20 to 23 hours of video. Not minutes. Hours. Not 23 videos themself. We’re talking hours. So tons. And that’s every minute, every day. Every month. It just — it’s coming in.

So the question is, who’s going to bottle that water?

Ken Harrenstien, Announcement on Accessibility and Innovation, 25:08 

How to keep up with this perpetual flow? He then proceeds to play a clip from that year’s Google I/O, with captions switched on. He then turns to the audience to ask if they notice anything different.

YEEESSSS!

People who notice the mistakes eventually guess that the captions are machine generated, much to Ken Harrenstien’s amusement. Automatic captioning is finally on YouTube! Having learned their lessons from Google Voice, instead of launching the feature for all users, initially automatic captioning would only be available to a few other partner channels.

Back then the process was a lot slower, you got a warning saying that the feature was experimental and then you would have to wait sometime for the transcript to be generated. They were taking things easy. It would be months before the feature was allowed on other channels.

Still, it was a million times better than nothing, and what’s more, there was an alternative for all the other channels. Automatic timing was a new feature that allowed users to upload an existing transcript and would generate timestamps to align the text with speech. Believe it or not, this feature is still on YouTube to this day!

The final portion of the demonstration is Naomi Black, the same Naomi who had worked on building a public corpus of captions, showcasing these features from the uploader’s perspective.


As the audience applauds, the question lingers: “Will automatic captioning be able to keep up with the astronomical rate of video uploads?” It brought hope, that was for sure, but the unreliable accuracy still left something to be desired.

Join us next week, when we talk about the ingenious attempts to contain this waterfall! 

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-ii-bringing-it-all-together/feed/ 0