SCRAPE A WEBPAGE

WEB SCRAPING WITH PYTHON 3
0

PYTHON 3

EASY

last hacked on Jul 22, 2017

This project focuses on scraping the web using Python, and is adapted from the book "Automate the boring Stuff with Python". Web Scraping, also known as Web Harvesting or Web Data Extraction, refers to the process of extracting information (typically in HTML format) from websites. Web scraping is used to rank websites by traffic, track and rank products and prices, and can be useful to data scientist who they are looking to model data from the web.
# The Code Create a new file called `scrapeSite.py`. Then type the following code into your newly created file. #! python3 #---------------------------------------------------------- # scrapeSite.py opens and scrapes a site of your choosing #---------------------------------------------------------- # execute in the command line: # # python3 scrapeSite.py <URL> #---------------------------------------------------------- # importing required modules import sys, requests, webbrowser # declaring command-line argument as variable # this argument should be webpage to be scraped siteToScrape = ''.join(sys.argv[1:]) # opening webpage in default browser webbrowser.open(siteToScrape) # scraping source code for webpage res = requests.get(siteToScrape) # checking for errors res.raise_for_status() # creating file to dump scraped source code playFile = open('scrapeMent.txt', 'wb') # dumping scraped source code into file in # "chunks" of 100000 bytes at a time for chunk in res.iter_content(100000): playFile.write(chunk) Learn to install the required software and run your script in the following sections. # Install Python 3 and Required Modules You will need to install `Python 3` to run this project. Download it at [python.org](https://www.python.org/downloads/). You will also need the `Python 3` modules `sys`, `requests`, and `webbrowser`. We recommend you use `pip` to install them. Google how to install `pip` for `Python 3` and your **operating system**. Then run the following commands on your terminal to install the required packages: $ sudo pip3 install sys $ sudo pip3 install requests $ sudo pip3 install webbrowser # Execute The Script To execute the script, enter the following this command into your terminal: `python3 scrapeSite.py <URL>`. Make sure to replace `<URL>` with the URL of a website you would like to scrape. For example, if you'd like to scrape Inertia7.com, then you would enter: $ python3 scrapeSite.py http://www.inertia7.com

COMMENTS







keep exploring!

back to all projects