How to Archive or Scrape MediaFire Files using mf-dl

By themadprogramer June 24, 202137 CommentsTutorial

MediaFire is a home to millions of files! MediaFire’s generous upload limits appeal to visual artists who can upload their work in higher resolutions, composers and remixers who want to host their WIP music off-platform; and really anyone who wants to upload big .zip files.

Unfortunately, MediaFire doesn’t have a search/discovery feature, relying entirely on search engine traffic and external linking. There’s a lot of undiscovered things on MediaFire, and Pyxia’s mf-dl tool is one of the first tools that we have for exploring it. Read on to learn how to install and use mf-dl to easily download MediaFire files and crawl undiscovered corners of the internet!

Installation

mf-dl follows the usual steps for setting up a python tool:

Install Python 3 if you don’t already have it on your device.
Clone the mf-dl repository from https://gitgud.io/Pyxia/mf-dl.git using a git client. Alternatively download the repo and unzip it.
Using a terminal, cd into the mf-dldirectory and run python3 -m pip install -r requirements.txt.

Downloading Files with `mfdl.py`

mfdl.py is a bulk-downloader for MediaFire links. You may have found these links yourself, copied them from your bookmarks or possibly scraped them beforehand. At any rate, mfdl.py will download the contents and metadata for a list of links that have already been collected.

The input is a sequence of links and can be any file separated by spaces, new-lines or commas. Ideally, you might want to use a spreadsheet-friendly CSV file. For this tutorial, copy the table below into Excel, or another spreadsheet editor, and save it aslinks.csv.

https://www.mediafire.com/file/y1s9a51a941h7b8/The_Wasteland_%2528MP3%2529.mp3/file

https://www.mediafire.com/folder/xb49obyqfut8d/Merlin_Das_Trevas

https://www.mediafire.com/file/ngteu63n26rhncj/readmetxt4999251.zip/file

links.csv

Next we will need an output directory to save mf-dl’s grabs. mf-dl does not have permission to create new directories, so you will have to create a new folder if the destination doesn’t already exist. For demonstration’s sake we will create/output directory under mf-dl.

If you have been following along, your mf-dl folder should look a little something like this.

To run mfdl.py, execute the following command from inside your terminal and mfdl.py will begin downloading the contents of the input links into the output directory.

> python3 mfdl.py output links.csv

Protip #1: Increasing Download Throughput
mfdl.py can download several files concurrently. By default, mfdl.py runs 6 threads; so that means that it will initiate 6 synchronous downloads at a time. If you have a high network bandwidth, you might want to increase the number of threads to maximize your downloading speed. Or if MediaFire is upset with your frequent downloads and is throwing CAPTCHAs your way, you can decrease your thread count. Use this modified version of the mfdl.py call to change your thread-count.

> python3 mfdl.py --threads NEWTHREADCOUNT output links.csv

Protip #2: Multiple Input Files
All arguments after the output are treated as input files. If you have links split across several files, you can simply concatenate them to the end of the command.

> python3 mfdl.py output links.csv links2.csv links3.csv

Scraping MediaFire links with `web_crawler.py`

web_crawler.py is a utility for discovering new MediaFire links. That’s right, links not files.web_crawler.py does not download the corresponding files and we will need to later feed the outputted links into mfdl.py.

Setting up web_crawler.py is a bit more straightforward. Then we need a seed URL to initiate the crawl. Any site with downloadables will make for a nice link farm. In this case we’ll be using the Minecraft Pocked Edition Downloads site https://mcpedl.com/ as our seed.

To run web_crawler.py, execute the following. Note that web_crawler.py will run indefinitely as new links are discovered, until its execution is interrupted.

> python3 web_crawler.py https://mcpedl.com/ links_found.txt

Protip #1: Feeding Back Links
You can feed links found using web_crawler.py intomfdl.py with

> python3 mfdl.py output links_found.txt

In fact, if you’re familiar with Crontab, you can schedule periodicmfdl.py jobs to download new links as they are added to links_found.txt. This away, you can continue to download new links, without ever stopping web_crawler.py.

Protip #2: Depth Control
You can limitweb_crawler.py‘s search by specifying a filter. If you want to keep your search to mcpedl.com, ignoring out-links to facebook etc. you can --filter https://mcpedl.com.

> python3 web_crawler.py --filter https://mcpedl.com https://mcpedl.com/ links_found.txt

Alternatively, you can specify --regex option if would rather filter with regular expressions instead.

Protip #3: Thread Control
web_crawler.py can also run on multiple threads, 6 by default. You can choose the number of maximum threads you want to use by, again, specifying the --threads option.

> python3 web_crawler.py --threads NEWTHREADCOUNT https://mcpedl.com/ links_found.txt

Have any more questions? To learn more about MediaFire archiving, check out the MediaFlare project!

Last updated on June 24, 2021

themadprogramer

View All Posts

37 Comments

Gram

Reply

July 24, 2021, 11:20 am

How do I retain the directory structure of the scrape? It’s all there but completly unsorted.
Thompson

Reply

February 6, 2022, 1:30 pm

This is too much for me to comprehend, I’m just trying to get an old Minecraft map back…
- Bush
  
  Reply
  
  December 4, 2022, 8:27 am
  
  same here
- redr
  
  Reply
  
  July 14, 2023, 12:40 pm
  
  dude i’m in search for some rare stardew valley portraits same thing
Pam

Reply

February 16, 2022, 11:08 pm

So grateful for your work here, all the more puzzle after I installed the latest Python, unzipped mf-dl and entered the cd command only to get “cd: no such file or directory: mf-dl-master” when it’s clearly unzipped and ready. Then I notice I must install a certify so I enter “$ pip install certifi” and get “zsh: command not found: $” All this just to download my own Mediafire files now hiding behind a paywall popup.
- Anonymous
  
  Reply
  
  February 20, 2022, 1:50 am
  
  when youre inside the folder just click the address bar and type cmd to open a command window from that folder
aids

Reply

April 20, 2022, 3:29 pm

any noob meinkrafters, if you get the python3 is not recognized thingy, try just “py” instead of “python3”
Asdf

Reply

May 13, 2022, 10:09 pm

Many of the files won’t download, at all. For example, even when I was using the tutorial links, the terminal would return “Couldn’t find download URL” and continue executing…
- gay
  
  Reply
  
  May 16, 2022, 6:47 am
  
  same
- MrCool
  
  Reply
  
  May 24, 2022, 3:17 pm
  
  The code needs to be updated for it to work. Just find download_link_prefix and change it to
  
  download_link_prefix = ‘\nPreparing your download…\n<a class="input popsok" aria-label="Download file" href="'
  
  I would send a PR but the repository is on some weird site I won't have time to sign up there.
  - themadprogramer
    
    Reply
    
    May 24, 2022, 4:20 pm
    
    But I can! PR’d! https://gitgud.io/Pyxia/mf-dl/-/merge_requests/3
- MrCool
  
  Reply
  
  May 24, 2022, 3:18 pm
  
  just need to adjust download_link_prefix a little due to html changes on mediafire.com. I would send a PR but the code is hosted on some weird site i don’t have account for.
Nora

Reply

May 26, 2022, 9:38 am

All the file cannot download. The terminal just return “Couldn’t find download URL”
- themadprogramer
  
  Reply
  
  May 26, 2022, 4:24 pm
  
  hey! just as MrCool said, there was a layout update to MediaFire and that broke the tool.
  
  I’m waiting on Pyxia to publish my pull request to update it, but until then you can follow the workaround MrCool wrote by replacing a line in the mfdl.py code. Just open it in any text editor, edit and save!
  - giorno420
    
    Reply
    
    June 29, 2022, 11:39 am
    
    heyy, so is this fixed?
  - Trev
    
    Reply
    
    December 2, 2024, 1:38 am
    
    hey friend,
    the most recent method isn’t working for some reason – I think it’s because link is going to a secondary page. Can you assist?
  - Anonymous
    
    Reply
    
    December 5, 2024, 7:04 pm
    
    Are you able to update this? It’s currently not finding the URL
MigginBach

Reply

May 26, 2022, 10:53 pm

It says its downloading files but its not being sent to the output folder i made in the MF-DL directory, wheres it sending it to? did i do something wrong somewhere? the only thing that appears in there is a 0kb custom folders txt file
8BitMiller

Reply

May 26, 2022, 11:18 pm

I’ve used the workaround and everything seems to act like its downloading stuff but nothing is actually going to the output file, not even the tutorial links are they for whatever reason going somewhere else or did I just not apply the workaround right?
Vorporal

Reply

June 4, 2022, 11:39 pm

The last suggestion didn’t work for me.

Got it to work by using:
download_link_prefix = ‘<a class="input popsok" aria-label="Download file" href="'

Thanks MrCool and Pyxia
- Omega
  
  Reply
  
  June 22, 2022, 9:12 am
  
  This worked! Thanks
Nath

Reply

June 14, 2022, 7:26 pm

Line 38 in mfdl.py change it to :

download_link_prefix = ‘\n<a class="input popsok" aria-label="Download file" href="'
- HI
  
  Reply
  
  February 10, 2024, 1:16 am
  
  Worked perfectly, thanks!
giorno420

Reply

June 30, 2022, 4:53 am

anyone getting the error after a while:
requests.exceptions.ReadTimeout : HTTPSConnectionPool(host=’www.mediafire.com’, port=443): Read timed out. (read timeout = 30)

anyone getting this? does this skip the file or just retry the file to download for later? thanks!
Anonymous

Reply

August 9, 2022, 12:06 pm

i keep getting the “Unable to create process using ‘/bin/env python3 mfdl.py output links.csv'” when i try to run it. what do i do?
- glmdgrielson
  
  Reply
  
  August 25, 2022, 11:12 am
  
  That sounds like A. you’re running the script as just mfdl.py ... and B. your computer doesn’t recognize python3 as a command. Look into your package manager, maybe?
Carlos

Reply

September 8, 2022, 3:26 am

For now, try to use this:
https://github.com/NicKoehler/mediafire_bulk_downloader
This worked perfectly to my downloads…

Only one “problem” but is easy to solve. If not found “gazpacho”
install it and its solve and works perfectly
https://pypi.org/project/gazpacho/
Anonymous

Reply

October 1, 2022, 9:52 pm

It looks like the html has been updated again, latest PR should fix it and also give folks an idea of what to modify locally to get it working:

https://gitgud.io/Pyxia/mf-dl/-/merge_requests/3
Anonymous

Reply

August 18, 2023, 5:24 pm

any got this working in 2023?
anon

Reply

August 18, 2023, 5:25 pm

anyone got this working in 2023?
- Anonymous
  
  Reply
  
  August 21, 2023, 9:07 am
  
  Not for me.
  - Anonymous
    
    Reply
    
    November 20, 2023, 1:57 pm
    
    me neither
Daylan Allen

Reply

June 13, 2024, 11:22 am

Hey there! I stumbled across this tool today and realized the download_link_prefix was having issues again, a little help from our new friend GPT and I have a working version now.

Make sure to install Beautifulsoup4 – `pip install beautifulsoup4`

In mfdl.py – Find the ‘find_direct_url’ function and replace it with this:

def find_direct_url(info_url):
rq = requests.get(info_url, headers=HTTP_HEADERS, timeout=TIMEOUT_T)
web_html = rq.text
soup = BeautifulSoup(web_html, ‘html.parser’)

# Find the download link
download_link = soup.find(‘a’, {‘class’: ‘input popsok’, ‘aria-label’: ‘Download file’})
if download_link is None:
return {“success”: 0}

direct_url = download_link[‘href’]

# Find the uploaded location
uploaded_from_tag = soup.find(‘p’, text=re.compile(‘This file was uploaded from ‘))
if uploaded_from_tag is None:
return {“url”: direct_url, “location”: “Unknown”, “success”: 1}

uploaded_from = uploaded_from_tag.text
location = uploaded_from.split(” on “)[0].replace(“This file was uploaded from “, “”)

return {“url”: direct_url, “location”: location, “success”: 1}

This uses BS4 to parse the webpage and look for the href download link we want, avoiding the manual way of just searching for the exact download_link_prefix. I would make a commit or merge, but it seems the 2 maintainers aren’t active on there anymore. Hope this helps anyone using the tool! Everything else in the tutorial (at least for downloading folders) should work fine.
- Kayem
  
  Reply
  
  August 19, 2024, 8:41 am
  
  For anyone using this method, remember to add “from bs4 import BeautifulSoup” at the beginning! Or else soup = BeautifulSoup(web_html, ‘html.parser’) won’t work.
  - Anonymous
    
    Reply
    
    September 1, 2024, 8:42 am
    
    OHH THANKS.
- Anonymous
  
  Reply
  
  September 1, 2024, 8:40 am
  
  OHHH THANKS. You also need to add: from bs4 import BeautifulSoup.
Kiran vfx

Reply

October 29, 2024, 1:46 am

BeautifulSoup way also didn’t work in oct-2024 🙁

Installation

Downloading Files with mfdl.py

Scraping MediaFire links with web_crawler.py

37 Comments

Leave a Reply Cancel reply

Downloading Files with `mfdl.py`

Scraping MediaFire links with `web_crawler.py`