This project focuses on scraping the web using Python, and is adapted from the book Automate the boring Stuff with Python.

Web Scraping, also known as Web Harvesting or Web Data Extraction, refers to the process of extracting information (typically in HTML format) from websites. Web scraping is used to rank websites by traffic, track and rank products and prices, and can be useful to data scientist who they are looking to model data from the web.

For this project we leverage Python 3, including the modules sys, requests, and webbrowser.

1. The Code

Create a new file called Then type the following code into your newly created file.

#! python3
# opens and scrapes a site of your choosing
# execute in the command line:
#   python3 

# importing required modules
import sys, requests, webbrowser

# declaring command-line argument as variable
# this argument should be webpage to be scraped
siteToScrape = ''.join(sys.argv[1:])

# opening webpage in default browser

# scraping source code for webpage
res = requests.get(siteToScrape)

# checking for errors

# creating file to dump scraped source code
playFile = open('scrapeMent.txt', 'wb')

# dumping scraped source code into file in "chunks" of 100000 bytes at a time
for chunk in res.iter_content(100000):

Learn to install the required software and run your script in the following sections.

2. Install Python 3 and Required Modules

You will need to install Python 3 to run this project. Download it at

You will also need the Python 3 modules sys, requests, and webbrowser. We recommend you use pip to install them. Google how to install pip for Python 3 and your operating system. Then run the following commands on your terminal to install the required packages:

sudo pip3 install sys
sudo pip3 install requests
sudo pip3 install webbrowser

3. Execute The Script

To execute the script, enter the following this command into your terminal:

python3 URL

Make sure to replace URL with the URL of a website you would like to scrape. For example, if you'd like to scrape, then you would enter:



