A Linux bash script to download all pdf files from a page

Written by

Are you trying to download multiple files from a webpage and bored from clicking and clicking ??

I needed to download like a 100 PDF from a single web page , so I started to look for a bash script that automates the process and found this interesting article by Guillermo Garron that combines several useful programs into a nice script to download all links from a page using lynx command line web browser and wget downloader.

First , install the the browser

$ sudo apt-get install lynx
Lynx has a nice feature that allows you to grab all links from a page

$ lynx --dump http://mlg.eng.cam.ac.uk/pub/ >> ~/links.txt
The output will be like this

So we need to filter out the first numbering column and all non pdf links for the output to be nice and readable by wget

$ lynx --dump //http://mlg.eng.cam.ac.uk/pub/ | awk '/http/{print $2}' | grep pdf >> ~/links.txt
Resulting in a clean input to wget

and the last step is to pass this file into wget to download all the pdfs

$ for i in $( cat ~/links.txt ); do wget $i; done
voilà ! you get all the files downloaded

Comments

3 responses to “A Linux bash script to download all pdf files from a page”

August 6, 2015

Anonymous

Thank you for posting this. Helped me a ton just now 🙂

Reply
January 25, 2016

Anonymous

Thx, worked beautifully

Reply
March 28, 2019

Anonymous

Nice and elegant! TKS!

Reply

A Linux bash script to download all pdf files from a page

Comments

3 responses to “A Linux bash script to download all pdf files from a page”

Leave a Reply Cancel reply

More posts

Part 2: From a Line to a Language Model

From Lines to Neurons

Is Software Solved?

Stop the Massacre !