google – Data Horde https://datahorde.org Join the Horde! Sat, 11 Sep 2021 00:51:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png google – Data Horde https://datahorde.org 32 32 Help Archive Team Archive public Google Drive files before September 13! https://datahorde.org/help-archive-team-archive-public-google-drive-files-before-september-13/ https://datahorde.org/help-archive-team-archive-public-google-drive-files-before-september-13/#comments Sat, 11 Sep 2021 00:50:54 +0000 https://datahorde.org/?p=2637 On September 13, Google is going to start requiring longer URLs to access many Google Drive files, breaking links to public files across the web unless users opt out! Because of this, Archive Team has launched a project to archive as many publicly-available Google Drive files as possible and make them accessible on the Internet Archive Wayback Machine. (Note that video files are not included at this time due to their size.)

You can help! Simply follow the steps to download and run an Archive Team Warrior, and then select the Google Drive project. (You can also run the project using a Docker container using atdr.meo.ws/archiveteam/google-drive-grab as the image address.)

Additionally, people with lists of public Google Drive file URLs are encouraged to share them so they can be archived.

In order to stay up-to-date with the project and be reachable in case of an issue, project contributors are encouraged to connect and stay connected to the project discussion channel, #googlecrash on irc.hackint.org, also available through webchat.

Archiving progress statistics for this project are available on the Archive Team project tracker, and source code is available on GitHub.

]]>
https://datahorde.org/help-archive-team-archive-public-google-drive-files-before-september-13/feed/ 2
This Week In Archiving 06/28/2021 https://datahorde.org/this-week-in-archiving-06-28-2021/ https://datahorde.org/this-week-in-archiving-06-28-2021/#respond Mon, 28 Jun 2021 16:30:45 +0000 https://datahorde.org/?p=2426 In Memoriam

Long time SNES manual preservationist, author and contributor to numerous emulation projects and friend to many game preservationists Near/Byuu has passed away on June 27, 2021.

Earlier that day, they had sent out a series of Tweets to the effect of a suicide letter. Wishing to remain anonymous, the last person to have talked to Near prior to their death would soon contact Hector Martin “marcan”, believing them to have taken their own life while on the phone. Several hours later the event was confirmed by local police.

This tragic turn of events had followed episodes of harassment, which Near detailed in their final words. Their parting request was that they be remembered for their many contributions to the community, and not for they were about to undertake.

Shutdowns

Last Wednesday, on June 23, YouTube announced a decision to automatically set all Unlisted Videos uploaded prior to 2017 to private, one month later on July 23. The difference being that; unlisted videos are hidden from search results, but people with the link can access them, whereas private videos are inaccessible to other users unless the uploader gives manual approval.

While channels have the ability to opt-out, to keep their unlisted videos unlisted, and not privated, it has come to the attention of archivists that many inactive channels are unlikely to pick this option given the short timeframe presented.

In the way of archiving projects for unlisted videos, there has already been much discussion and some organization on #down-the-tube on hackint, Hacker News, r/datahoarder and the Distributed YouTube Archive. Alas, there is yet no project in motion, at this time.

The Unlisted Videos website, which was made specifically for the purpose of collecting links to unlisted videos, has been scraped or is being scraped by several groups, worth about half a million videos. There is also Jopik’s searchable collection of 4.5 million unlisted videos, which is a monument in its own right. That being said, these are only links, and the video files themselves have yet not been mirrored. So be sure to stay tuned for upcoming projects.

To spread awareness of the situation, we are doing countdown of unlisted videos on the Data Horde Twitter account.

This upcoming change from YouTube comes with a similar update to Google Drive, which will render many shared files inaccessible to users who have not accessed them prior to a certain date.

Updates

In support of the efforts to archive unlisted videos, Sponsorblock has introduced a new feature to detect, and anonymously submit links to unlisted videos, that users might be watching.

If you go to this currently unlisted video, Wakasensei (Mitsuteru Ueshiba) - 47th All Japan Aikido, with Sponsorblock installed, you will see a little infobox on the right side of the video informing you that unlisted video links are being collected. You can help!

The Flash Player emulator Ruffle, is now a bit easier to install. Ruffle has finally been added as an extension to the Chrome Web Store, and you can run it from the comfort of Chromium browsers.

In other news, the mod community/archive Gamebanana suffered a major outage over the weekend. Thankfully, as it turns out, this hiccup was only the result of a billing glitch on their host’s side. The site is now up and running once more.

Can you believe our host accidentally suspended 16 of our servers due to a billing glitch, and nobody was around to fix it because it’s a Sunday. This is the biggest host blooper we’ve ever encountered in 20 years.

tom, Gamebanana Admin

Discoveries

Image Copyright: Mojang
Screenshot taken by MewtwoTheGreat

Members of Omniarchive, a group dedicated to archiving old lost versions of Minecraft, managed to recover the elusive Alpha 1.1.1. version on June 25, 2021. The first of many Seecret updates, Alpha 1.1.1. was notable for being online for only a few hours before the Alpha 1.1.2 hotfix.

Archivist ProffApple found a tweet someone who had just downloaded the update had made over a decade ago on the day that Alpha 1.1.1. came out, September 18, 2010 to be specific. Turns out, they still had the game files lying around!

You can read more about the story on this Kotaku Article by Zack Zwiezen and this PC Gamer article by Jonathan Bolding.

]]>
https://datahorde.org/this-week-in-archiving-06-28-2021/feed/ 0
Google Cuts Free Unlimited Storage in Photos, Drive https://datahorde.org/google-cuts-free-unlimited-storage-in-photos-drive/ https://datahorde.org/google-cuts-free-unlimited-storage-in-photos-drive/#respond Thu, 12 Nov 2020 06:22:45 +0000 https://datahorde.org/?p=1746 On Wednesday, Google announced that it will be ending free unlimited storage of high quality uploads in Google Photos, as well as free unlimited storage of Docs, Sheets, Slides, Drawings, Forms, and Jamboard files. This change will go into effect on June 1, 2021. Existing files and photos will remain unaffected, but editing a Google Drive file will make it count against the storage limit.

This change will not apply to Sites, Keep, Blogger, or YouTube.

Google also announced that if a user does not use Gmail, Google Drive, or Google Photos for two years, they may delete the user’s data from the product within which the user is inactive. Additionally, if a user remains above their storage limit for more than two years, their data may be deleted.

These changes align with other recent changes made by Google. For example, starting Friday, November 13, files that have been in a user’s Google Drive trash folder for more than 30 days will be permanently deleted. Additionally, Google recently replaced G Suite with Google Workspace, making unlimited storage only available on enterprise-level plans.

All the while, Google has been increasingly encouraging customers to subscribe to their new Google One service, which provides expanded storage and other Google benefits to customers.

]]>
https://datahorde.org/google-cuts-free-unlimited-storage-in-photos-drive/feed/ 0
Unusual Beginnings on Google Video; YouTube CC History pt.1 https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/ https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/#respond Fri, 28 Aug 2020 22:01:30 +0000 https://datahorde.org/?p=1125 Have you ever watched a video where you couldn’t understand a word of what was being said? Maybe the volume was too low or the people were whispering. Perhaps you are hard-of-hearing or even deaf. Or it was just in a foreign language. These are no reasons to be ashamed! You’re only one click away from seeing what was being said using the CC button! 

Today, the closed caption “CC” is one of the most recognizable icons in the world. A good part of that is due to the advent of captioning for online video and streaming services. In particular, YouTube once stood out as an early adopter and strong advocate for closed captioning.  

Option to contribute subtitles/CC on YouTube

Alas in recent years they have made decisions that have led to them going from caption-hero to villain. Their latest act of treachery is their decision to remove community contributions, a feature that allowed viewers to submit captions for videos they wish to transcribe or translate. 

As a sort of countdown, until community contributions are gone for good, I am going to be recounting the history of closed captions on YouTube, over the next few weeks. I hope our readers will find this to be a fascinating retrospective on a much-overlooked technology…


Classic YouTube Logo from 2005

Our story begins in 2005, with the inception of two websites: YouTube and Google Video. Now, online video was by no means a new phenomenon, the concept of a viral video was already a decade old at this point. But the video culture of the time was quite different. Videos were big files, so if you wanted to share a clip with your friend via email, you needed to lower the quality and duration. If you wanted to have it available for the world to see you would need to host it on a web server and even then cost would be the least of your worries. What had been a daunting challenge up until this post was building a platform, where users could freely upload videos, while remaining profitable. Both YouTube and Google Video would end up becoming early success stories in that regard.

It would not be long before these two websites, especially YouTube, became a part of our daily lives. What I am sure will come as a surprise to some readers, is that these two websites share a very much intertwined history, especially when it comes to captioning. 

Google Video Logo

Throughout 2005, Google Video was definitely leading the competition. But once it started to lose ground to YouTube in 2006 it never recovered. Still, Google seems to have believed that there was some hope for Google Video and to that end, they competed to become the better platform by introducing new features. One such feature was, in fact, closed captioning.

(There is supposed to be a Google Trends graph here, but if it somehow fails to load click here)

Search frequency for Google Video and YouTube between 2005 and 2006.

Banner of the Google Video Blog

On 19 September support for closed captioning of videos was announced on the Google Video Blog:

Although many of us are responsible for making this possible, it’s particularly meaningful to me because I’m not only an engineer fortunate enough to work on Google Video — I’m also deaf. In some ways this reminds me of when closed-captioning (CC) was first introduced; before that, little on TV made sense and the only movies worth paying for were foreign films, because those were the only ones with subtitles! I now have the same sense of hope that I did then, when you could finally see visible progress and knew for sure that however long it took to perfect things, we really were on the way.

Ken Harrenstien, Google Video blog 

For the first time ever, you had a website where you could both host and stream your own videos and captions. At the time, the CC button seemed like a “subtitles on” option, which you could expect from TV or DVDs. Unfortunately, this historical development was quickly overshadowed by Google’s acquisition of YouTube, only three weeks later in October. Nonetheless, it was quite the accomplishment and even YouTube takes this to be the epoch for closed captions on their website, as evidenced by future blogposts and conferences.

Soon after, Google Video began a slow descent into obscurity, as the development team slowly migrated to work on YouTube. Although the website went defunct altogether, we still have some snapshots from this era. The thumbnail for this post is a screenshot of one of the first-ever captioned videos uploaded, titled “Google Video & YouTube Support Closed-Captioning”. Although this photo only shows us the UI, fortunately, the “Me at the zoo” of closed captions survives on YouTube! 

The uploader, Dan Greene, who provided us with a lot of insight during my research, later uploaded the video to YouTube in 2009:

Now you might be wondering why anyone would wait so long. The fact of the matter is, it would be another 2 years until YouTube received its own native support for captions. Ken Harrenstien and the rest of the core team working on captions would remain on the Google Video side of things for many more months.

Undaunted, the Google Video team would continue to bring innovations to captioning. While their accomplishments only survive in fragments, a good place to be looking is Google’s own patents, a lot of which have Ken Harrenstien’s name on them. There is US9710553B2 relating to UI design for closed captions and US20140301717A1 relating to support for multiple caption tracks. 

There is one patent in particular which is especially relevant to the community contribution feature, US7992183B1: Enabling users to create, to edit and/or to rate online video captions over the web. To date, this is the earliest known proposal for any form of community contribution to closed captions, on Google’s products anyway. Up until this point, if an uploader hadn’t added captions for a video, that was it, you could not do anything about it. There were services like Overstream, which allowed people to caption online videos, but not only were these relatively unknown, you had to actively hunt for a captioned video. This patent describes a method that could have potentially changed that.

Figure illustrating options for captions in different languages, there are 3 captions available in English authored by Josh M., Kim L. and J. Doe with a score of 4, 3 and 1 respectively.

This form of community contribution would operate on a rating-based system, multiple users would be able to submit closed captions and viewers would decide on which captions were the best, by rating on a scale from 1 to 5. This differs from the modern system, which enforces one translation per video, and is split into a submission and review phase. In practice, one could imagine this early concept to have worked much faster. I was unable to find any evidence that this idea was ever realized, but at the very least it goes to show that the need for community contribution was there, even back in 2007.


Moving into 2008, Google really started to pick up the pace. On 4 June, we saw YouTube’s answer to captions: Video Annotations!  

We’re happy to announce a new way to add interactive commentary to your videos — with Video Annotations. With this feature, you can add background information, create branching (“choose your own adventure” style) stories or add links to any YouTube video, channel, or search results page — at any point in your video. 

The YouTube Team, The YouTube Blog

Video annotations were designed to be a lot more versatile than captions, not solely restricted to descriptions of what is being said or happening on the screen. Still, seeing as there still was no native closed captions feature, it would not take long until people figured that they could use these video annotations for subtitling or captioning videos.

Using annotations:

Q: What will you show us today?
A: I'm gonna show you some nice new armors, how we create them and such
Original Video: Aion – New Armors (interview) – WITH ENGLISH ANNOTATIONS!
Annotations are no longer available on YouTube, so the ones you see here were retrieved via https://invidious.snopyta.org/watch?v=mCg54B69aY4&iv_load_policy=1

And if you are still not convinced that this was a response aimed at the closed captioning, get a load of Google Video’s newest feature that was announced the very next day: Closed Captioning Search! This made it possible to search not only for a particular video but through its contents. An example use case could be searching for a particular talk:

Here’s a nice example – search for [“that’s a tremendous gift”] . Make sure you’ve selected List View, and you should see a video featuring Randy Pausch. Clicking on the “Start playing at search term (50:16)” link will take you to a point slightly before the appearance of that caption.

Ken Harrenstien, Google Video Blog
Results for "that's a tremendous gift", an option to skip to the 50:16 mark of Randy Pausch's "Really Achieving Your Childhood Dreams" speech is available.

Believe it or not this feature was eventually brought over to YouTube, though if it works a bit differently.

Sad to say, these features were not much appreciated at the time, as YouTube was suffering from poor layout changes (see comments on the old blog page) and Google Video was suffering from serious stability issues. There was however, a silver lining to all of this.

If you remember visiting YouTube in 2008, you might recall some of the interesting gadgets and features they had introduced for the upcoming presidential election. Let’s talk about a little project called Gaudi.

The premise of Google Video’s caption search, being able to search for things people say, might not very enticing. But what if you wanted to search for things important people, say, politicians have said in the past? So Google’s Speech Research Group developed this tool called Gaudi, which not only allowed you to search through the speeches of presidential candidates but also generated transcripts for relevant news and politics videos as they were uploaded. By bundling speech-to-text

Gaudi is gone now, but it was once available on Google Labs, as a gadget on iGoogle and it was even embedded on YouTube’s YouChoose page offering the latest updates on the presidential campaigns. It even received some media coverage.

If you would like to learn more about the behind the scenes for this one, one of the authors, Michiel Bacchiani, was kind enough to upload a copy of their publication to ICASSP ’09, on his website.


So now we had a form of speech-to-text on the largest video sharing website in the world, no matter how limited. But what to do with this technology? Combining it with video annotations would be too non-trivial. YouTube still didn’t have closed captions either, forcing people to have to use websites like YouTube Subtitler. Perhaps, now was the time to bring the Google Video captioning features over to YouTube!

In the coming months, members from the team that worked on Gaudi, Google Video’s division responsible for closed captions and the existing YouTube team would come together to bring closed captions to YouTube. 

Join us next week, to see how all these little pieces would come together to make a whole, truly greater than the sum of its parts.

]]>
https://datahorde.org/a-history-of-youtubes-closed-captions-part-i-unusual-beginnings/feed/ 0
The Impact of YouTube Removing Community-Contributed Closed Captions https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/ https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/#respond Fri, 14 Aug 2020 18:50:06 +0000 https://datahorde.org/?p=1196 Closed Captions have been an essential feature on YouTube for nearly 12 years. They’ve made the platform more accessible, not only by serving transcriptions for deaf and hard-of-hearing users but also by serving as a medium for translating videos into a multitude of languages.

Screenshot of Michael from VSauce
Spooky Coincidences by Vsauce, a video which allows for community contributed closed captions.

For more than half of their existence, YouTube has supported one form or another of contributing closed captions on videos which don’t already have them. Which is why two weeks ago, it came as an unpleasant surprise when Google announced that they were removing community contributions in September:

Community contributions will be discontinued across all channels after September 28, 2020. Community contributions allowed viewers to add closed captions, subtitles, and title/descriptions to videos. This feature was rarely used and had problems with spam/abuse so we’re removing them to focus on other creator tools. You can still use your own captionsautomatic captions, and third-party tools and services. You have until September 28, 2020 to publish your community contributions before they’re removed. 

Google Support Page, retrieved 13 August 2020

Now to be clear, closed captions aren’t going anywhere.

  • Video uploaders will be able to continue uploading their own closed captions,
  • Captioning services such as Amara will remain available,
  • And of course, YouTube’s own automatic captions are here to stay.

But no longer will viewers be allowed to contribute captions. Previously published captions will still be online, submissions will just not be allowed anymore. It is unknown if attribution for previously published captions will remain intact.

Even more confusing was Google’s justification for this removal, which they provided on the YouTube Support Community. Lamenting that content creators and viewers expressed dismay at the frequent abuse and low quality in community captions, they attributed the relatively “rare usage” of this feature to the bad name it has made for itself. It was a broken and unwanted feature, which warranted a discontinuation…


Let’s talk a bit about just how “rare” that usage statistic is?

… the feature is rarely used with less than 0.001% of channels having published community captions (showing on less than 0.2% of watch time) in the last month.

As much as the wording makes the feature seem insignificant, note that it is being expressed in terms of channels. This does not paint a very clear picture of how many viewers are dependent on this feature.

number of youtube channels
There are approximately 16 thousand channels with over a million subscribers, that’s at most 0.0005% of all channels.

Recently tubics, a company that provides SEO for content creators on YouTube, made a blog-post where they presented some interesting statistics on YouTube channels. Using SocialBlade data, they determined that a very small percentage of all the channels on YouTube have the majority of viewership. Even by the most optimistic estimate, only 0.006% of all channels have over a 100 thousand subscribers, of which only 0.0005% have over a million. This shows us that even a feature which is utilized by this small a fraction of channels can have an impact on potentially millions of viewers.


The ulterior motive here is likely to promote Google’s own automatic captions and especially their automatic translation. Their accuracy has been drastically improving over the years, and in some cases this really is the only reliable way to translate a video into a less spoken language. But that being said, is it a replacement for community contributions? I think not!

There are a lot of use cases unique to community contributions, which aren’t offered by any of the remaining alternatives.


  • When a user uploads their captions or Google generates them, it’s often a one-time process. If there’s a mistake, no one is going to go back to fix it. But with community contributions, users can build off of previous work by correcting one another, not unlike how Wikipedia works.
Typo: "Which sits in Iraq" instead of "Which sits in a rack"
What are Microservices? by freeCodeCamp.org,
There is a typo in the captions: “Which sits in Iraq” instead of “Which sits in a rack”.
A Rack
Thanks to Community Contributions being allowed on this video, anyone can go in and fix it!
  • Even though community contributions carry the risk of sabotage, they also provide viewers with the power to moderate. Someone snuck in a joke on a video you were watching? Just go into the caption editor and edit it out!
Sincerely, Me || Dear Evan Hansen Animatic by szin,
The 🙁 at 0:21 is likely an artifact from translating the original captions from Polish to English (where there was text), but it’s not uncommon to sneak in emoticons or naughty easter eggs where they don’t belong. Community Contributions spare channels of having to moderate these manually by relegating the task to channel viewers.
  • Where automated captions are stuck with plain text, and uploaders are limited by the time they’re willing to invest to stylize their captions, there are people out there waiting to tap into their potential. YouTube supports a plethora of caption/subtitle formats, which a seasoned captioner can use to add color, formatting and emphasis!
Scenarist Closed Captions
YouTube supports Scenarist Closed Captions… but how well? by Jibberuski
SCC is but one of many captioning formats. If you’ve ever watched a video where captions weren’t locked into the bottom of the screen, it was likely captioned in SCC or EBU-STL.
  • In translation, there are times when we don’t want it to be too precise. An example could be explaining the meaning of a word, the wordplay in a joke… And although there are techniques to recognize proper names, there are sentences that automated translation is not designed to handle.
1984 by George Orwell, Part 1 by Crash Course
Around the 2nd minute mark, John Green explains the parallels and contrast between the names of Winston Smith, the protagonist of 1984, and British Prime Minister Winston Churchill. Here, the consciously authored Spanish captions are able to highlight the significance of the words Church and Hill.
Whereas the automatically generated captions immediately translate Churches into Iglesias and actually fail to translate Hills into Colinas.

As you can see, there are all too many reasons not to remove community contributions. Which is why more and more YouTubers are campaigning against the change. There’s a change.org petition which has already been signed by over 470,000 people. I don’t know about you, but that sounds like a pretty significant figure.

At the very least, Google could try a compromise. It’s still possible to live in a world where the viewers can contribute to captioning, and the process can be automated. Take Google Translate for instance, they’ve been able to improve their accuracy a lot through the “suggestion” feature.

Making captioning exclusive to the channel owner and Google’s own tools is going to make a lot of lives harder, when people are dying to make those same lives easier. So go on and spread the word! What will your contribution be?

]]>
https://datahorde.org/the-impact-of-youtube-removing-community-contributed-closed-captions/feed/ 0
Classic Google Sites To Disappear Starting November 1, 2020 https://datahorde.org/classic-google-sites-to-disappear-starting-november-1-2020/ https://datahorde.org/classic-google-sites-to-disappear-starting-november-1-2020/#respond Sat, 08 Aug 2020 23:10:00 +0000 https://datahorde.org/?p=1086 I recently received an email from Google stating that websites created in the original version of Google Sites need to be converted to the new Google Sites format. Sites that haven’t been viewed or edited since before January 2018 will be inaccessible to the public as of November 1, 2020, while active sites will be inaccessible as of September 1, 2021.

While the new version of Google Sites launched in 2016, it will still be possible to create new classic Google Sites until November 1, 2020.

After November 2020, inactive classic Google Sites will have a private copy exported to Google Drive. This copy will not necessarily be easy to republish at its original URL in the future. After September 2021, active classic Google Sites will have a private copy exported to Google Drive and a private draft copy of it created in the new Google Sites, meaning they can be easily be republished in the future.

This news is important because it means that many websites published with classic Google Sites will be inaccessible to the public if the site owner is not around, not able to, or doesn’t care to take action to migrate it to the new Google Sites or export the content and transfer it elsewhere.

Although classic Google Sites launched in 2008, some of the content that will be lost actually dates back as far as 2006 and was originally created on the predecessor to Google Sites known as Google Page Creator. When that platform shutdown in ~2008-2009, many of the sites were converted to classic Google Sites. The URLs hosted on *.googlepages.com actually still redirect to their associated Google Sites, but, given their age, a larger portion of these sites are probably inactive and will disappear on November 1, 2020.

Please spread the word and start making lists of classic Google Sites to archive in advance of these upcoming dates.

Further information is available on the Google Sites Help Center.

]]>
https://datahorde.org/classic-google-sites-to-disappear-starting-november-1-2020/feed/ 0