UTILIZING POSTGRESQL IN PYTHON

UNDERSTANDING DATABASE CONNECTIONS USING PSYCOPG2
2

PYTHON, POSTGRESQL, PANDAS

EASY

last hacked on Apr 30, 2018

## Utilzing PostgreSQL with python3 Here I'll show a quick demonstration on how to utilize python3 and PostgreSQL. # Load Packages You will need `psycopg2` and `pandas` to be able to create a connection and utilize tools many are already familiar with. ``` import psycopg2 import pandas as pd from datetime import datetime ``` # (Optional) Setting Credentials There are many ways to store credentials, here are some options: + `.ini` file (Windows Only) - must gitignore + Standalone python script - must gitignore + `.json` file- must gitignore + text file - must gitignore Let me know if you would like more detail regarding this section. # Creating connection Here I will show how to make a connection with all the credentials written by the user in script. First a connection string is needed: ``` # Set credential variables host = 'localhost' database = 'northwind' user = 'postgres' port = 5432 password = 'postgres' # Make connection string conn_string = "host={0} dbname = {1} user = {2} port = {3} password = {4}"\ .format(host, database, user, port, password) ``` Once these credentials are set you can connect utilizing the following commands: ``` conn = psycopg2.connect(conn_string) my_cursor = conn.cursor() ``` If this runs correctly you are now able to connect to the database. # Querying Database Here I'll demonstrate two ways one can query the database after the connection has been made. ## String We can create a query string with the following syntax within the python script ``` query_string = '''SELECT * FROM Customers''' ``` ## Text file Another option is to utilize a seperate text file which will be loaded in to python. ``` with open('query.txt', 'r') as my_file: query_string = my_file.read() ``` ### Contents of query.txt ``` SELECT * FROM Customers ``` Both serve a certain purpose but the latter helps de-clutter script especially if the query is long. # Connecting query to Connection Here is a simple function that takes in the query string and the connection and produces a data frame based on the query. ``` def open_string(query, cursor): cursor.execute(query) rows = cursor.fetchall() data_frame = pd.DataFrame(rows, columns = list(zip(*cursor.description))[0]) return data_frame ``` The function takes in the query string and executes it. The `fetchall()` command receives all the rows of the query result and returns a list of tuples. We put the contents into a `pandas` data frame where the cursor contains this property called description which contains meta data related to the query. The operation we are utilizing is fetching the column names and setting them as the names for the columns in the resulting data frame. To utilize this you would do the following: ``` customers_df = open_string(query_string, my_cursor) ``` Once you have created this data frame you should be able to preview the data frame to see if it produced the desired query. ``` customers_df.head() ``` # More Advanced Let's say you wanted to abstract a part of your query so you can run it against multiple schemas or time ranges, we can do this by utilizing python string formatting. For this example let's say you wanted to create data frames with separate date ranges without have to do too much data cleanage in python this can be accomplished as such: ``` new_query_string = '''SELECT COUNT(*) FROM Orders WHERE OrderDate BETWEEN %(start_date)s AND %(end_date)s''' ``` Let's go ahead and create a function that will capture and make sure the dates are in the correct format (to avoid potential SQL injections): ``` def get_dates(year, month, day): desired_date = '{:%Y-%m-%d}'.format(datetime(year, month, day)) return desired_date ``` Create the date ranges you would like inside a nested list: ``` my_dates = [[2016, 1, 1], [2017, 1, 1]] start_date16 = get_dates(my_dates[0][0], my_dates[0][1], my_dates[0][2]) # OR start_date16 = get_dates(2016, 1, 1) end_date16 = get_dates(my_dates[1][0], my_dates[1][1], my_dates[1][2]) # OR end_date16 = get_dates(2017, 1, 1) ``` The new `open_string` function would take in parameters as such: ``` def open_string(query, cursor, start_date, end_date): cursor.execute(query, {"start_date": start_date, "end_date": end_date}) rows = cursor.fetchall() data_frame = pd.DataFrame(rows, columns = list(zip(*cursor.description))[0]) return data_frame ``` Now we would run this call as such: ``` open_string(query_string, my_cursor, start_date16, end_date16) ``` And the resulting data frame would contain the respective query and you can create multiple date ranges or parameterizations to make your script run more efficiently!

COMMENTS







keep exploring!

back to all projects