file hosting – Data Horde https://datahorde.org Join the Horde! Sun, 27 Jun 2021 19:06:10 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.3 https://datahorde.org/wp-content/uploads/2020/04/cropped-DataHorde_Logo_small-32x32.png file hosting – Data Horde https://datahorde.org 32 32 Google rolls out change breaking shared Drive Links https://datahorde.org/google-rolls-out-change-breaking-shared-drive-links/ https://datahorde.org/google-rolls-out-change-breaking-shared-drive-links/#respond Sun, 27 Jun 2021 19:06:02 +0000 https://datahorde.org/?p=2344 Some users may have a surprise when they attempt to access a Google Drive link after September 13, 2021, the day the Google will roll out a security update to users. If you have never accessed a certain file before, and the file owner(s) have not opted-out of the update, a new URL containing a key will be required to access the file.

While Google states that this is a change to make link sharing more secure, it is obvious that not all users might welcome this change. The change will first roll-out to Workspaces. As an administrator, you can take action before July 23, 2021 to decide whether or not to enforce the link-change. By default admins opt-in, and their organization links will also be automatically updated. If, instead, an admin opts-out their organization’s members will be notified on July 26, 2021 to decide on whether or not they want to individually opt-out. The update will come into effect on September 13, 2021. Note that Admins can still change organization settings until the final deadline.

As a regular (free) user, you will receive the update notification July 26, at which point you may choose to opt-out before September 13, 2021. So, if you own a few files on Google Drive you made public, you should check whether you want to cancel the update, or replace all your links with the updated version. If no action has been taken, the update will be enforced on September 13, 2021, rendering some of your files inaccessible to users who haven’t accessed them prior to that date.

Google Drive logo

To learn more about the issue, you can read the blog post at https://workspaceupdates.googleblog.com/2021/06/drive-file-link-updates.html. This update comes following a similar change to make old unlisted YouTube videos private, which you can read more about below.

]]>
https://datahorde.org/google-rolls-out-change-breaking-shared-drive-links/feed/ 0
How to Archive or Scrape MediaFire Files using mf-dl https://datahorde.org/how-to-archive-or-scrape-mediafire-files-using-mf-dl/ https://datahorde.org/how-to-archive-or-scrape-mediafire-files-using-mf-dl/#comments Thu, 24 Jun 2021 14:10:54 +0000 https://datahorde.org/?p=2347 MediaFire is a home to millions of files! MediaFire’s generous upload limits appeal to visual artists who can upload their work in higher resolutions, composers and remixers who want to host their WIP music off-platform; and really anyone who wants to upload big .zip files.

Unfortunately, MediaFire doesn’t have a search/discovery feature, relying entirely on search engine traffic and external linking. There’s a lot of undiscovered things on MediaFire, and Pyxia’s mf-dl tool is one of the first tools that we have for exploring it. Read on to learn how to install and use mf-dl to easily download MediaFire files and crawl undiscovered corners of the internet!


Installation

mf-dl follows the usual steps for setting up a python tool:

  1. Install Python 3 if you don’t already have it on your device.
  2. Clone the mf-dl repository from https://gitgud.io/Pyxia/mf-dl.git using a git client. Alternatively download the repo and unzip it.
  3. Using a terminal, cd into the mf-dldirectory and run python3 -m pip install -r requirements.txt.

Downloading Files with mfdl.py

mfdl.py is a bulk-downloader for MediaFire links. You may have found these links yourself, copied them from your bookmarks or possibly scraped them beforehand. At any rate, mfdl.py will download the contents and metadata for a list of links that have already been collected.

The input is a sequence of links and can be any file separated by spaces, new-lines or commas. Ideally, you might want to use a spreadsheet-friendly CSV file. For this tutorial, copy the table below into Excel, or another spreadsheet editor, and save it aslinks.csv.

https://www.mediafire.com/file/y1s9a51a941h7b8/The_Wasteland_%2528MP3%2529.mp3/file
https://www.mediafire.com/folder/xb49obyqfut8d/Merlin_Das_Trevas
https://www.mediafire.com/file/ngteu63n26rhncj/readmetxt4999251.zip/file
links.csv

Next we will need an output directory to save mf-dl’s grabs. mf-dl does not have permission to create new directories, so you will have to create a new folder if the destination doesn’t already exist. For demonstration’s sake we will create/output directory under mf-dl.

If you have been following along, your mf-dl folder should look a little something like this.

To run mfdl.py, execute the following command from inside your terminal and mfdl.py will begin downloading the contents of the input links into the output directory.

> python3 mfdl.py output links.csv

Protip #1: Increasing Download Throughput
mfdl.py can download several files concurrently. By default, mfdl.py runs 6 threads; so that means that it will initiate 6 synchronous downloads at a time. If you have a high network bandwidth, you might want to increase the number of threads to maximize your downloading speed. Or if MediaFire is upset with your frequent downloads and is throwing CAPTCHAs your way, you can decrease your thread count. Use this modified version of the mfdl.py call to change your thread-count.

> python3 mfdl.py --threads NEWTHREADCOUNT output links.csv

Protip #2: Multiple Input Files
All arguments after the output are treated as input files. If you have links split across several files, you can simply concatenate them to the end of the command.

> python3 mfdl.py output links.csv links2.csv links3.csv

Scraping MediaFire links with web_crawler.py

web_crawler.py is a utility for discovering new MediaFire links. That’s right, links not files.web_crawler.py does not download the corresponding files and we will need to later feed the outputted links into mfdl.py.

Setting up web_crawler.py is a bit more straightforward. Then we need a seed URL to initiate the crawl. Any site with downloadables will make for a nice link farm. In this case we’ll be using the Minecraft Pocked Edition Downloads site https://mcpedl.com/ as our seed.

To run web_crawler.py, execute the following. Note that web_crawler.py will run indefinitely as new links are discovered, until its execution is interrupted.

> python3 web_crawler.py https://mcpedl.com/ links_found.txt

Protip #1: Feeding Back Links
You can feed links found using web_crawler.py intomfdl.py with

> python3 mfdl.py output links_found.txt

In fact, if you’re familiar with Crontab, you can schedule periodicmfdl.py jobs to download new links as they are added to links_found.txt. This away, you can continue to download new links, without ever stopping web_crawler.py.

Protip #2: Depth Control
You can limitweb_crawler.py‘s search by specifying a filter. If you want to keep your search to mcpedl.com, ignoring out-links to facebook etc. you can --filter https://mcpedl.com.

> python3 web_crawler.py --filter https://mcpedl.com https://mcpedl.com/ links_found.txt

Alternatively, you can specify --regex option if would rather filter with regular expressions instead.

Protip #3: Thread Control
web_crawler.py can also run on multiple threads, 6 by default. You can choose the number of maximum threads you want to use by, again, specifying the --threads option.

> python3 web_crawler.py --threads NEWTHREADCOUNT https://mcpedl.com/ links_found.txt

Have any more questions? To learn more about MediaFire archiving, check out the MediaFlare project!

]]>
https://datahorde.org/how-to-archive-or-scrape-mediafire-files-using-mf-dl/feed/ 30