LEARNING PANDAS

DATA MANIPULATION & ANALYSIS
0

PYTHON 3, PANDAS

MEDIUM

last hacked on Feb 10, 2019

Pandas

https://pandas.pydata.org/

What is pandas?

pandas is an open source, BSD-licensed library providing high-performance, easy to use, data structures and data analysis tools for the Python programming language.

Why pandas?

  • high-level data structures, DataFrame, for data manipulation with integrated indexing

  • tools for reading and writing data between in-memory data structures and different formats: CSV text files, SQL databases, HDF5 format, spreadsheets...

  • intelligent data allignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.

  • Flexible reshaping and pivoting of datasets

  • merging and joining of datasets

  • time-series functionality: date range generation and frequency conversion, moving window stats, moving window linear regressions, date shifting and lagging, etc

What is a Dataframe?

A Dataframe is the main object in Pandas. It is used to represent data with rows and columns (tabular or excel spreadsheet like data)


Implement with pandas

Set up environment

Install pandas

python3 -m pip install --upgrade pandas

Import pandas

import pandas as pd

Import CSV File as pandas Dataframe

df = pd.read_csv("file_path")

Explore data

type(df): get datatype

print(type(df))

Returns:

<class 'pandas'.core.frame.DataFrame>

df.shape: get count of rows and columns

print(df.shape) # notice that shape is an attribute of df, and not a method

Returns:

(x , y)

where x is the amount of rows, and y is the amount of columns.

df.columns: get names of columns

print(df.columns)

Returns:

Index(['column_1', 'column_2', '...' ,'column_n'], dtype='object')

df.dtypes: get datatypes of each column

print(df.dtypes)

Returns:

column_1  object
column_2  int64
column_3  float64
...
column_n  object

df['x'].max(): get columns max value

df["column_name"].max()

df['x'].mean(): get columns max value

df["column_name"].mean()

Data Munging / Data Wrangling

Process of cleaning messy data is called data munging or data grangling.

Problem

We have NA values in our df

df.fillna(0, inplace=True)

Handling Missing Data

import pandas as pd
pd = pd.read_csv('weather_data.csv', parse_dates=['day'])

df.set_index('day', inplace = True)
df

df.fillna()

Filling all NAs with 0

new_df = df.fillna(0)
new_df

Specifying filling value by column

new_df = df.fillna({
    'temperature': 0,
    'windspeed': 0,
    'event': 'no event'
})

Forward filling

new_df = df.fillna(method = 'ffill')
new_df

Backward filling

new_df = df.fillna(method = 'bfill')
new_df

.fillna() method also has a nice argumnet limit, which, we can set equal to an integer, which helps us limit the amount of filling we want to do, depending on whether we are doing forward or backward filling.

Interpolation

This employs more sophisticated filling.

new_df = df.interpolate()
new_df

For example, it fills according to averages of surrounding existing values to missing values.

Additional arguments can be used, like method="time", to take into account available date data to compute best interpolation.

Drop NA

new_df = df.dropna()
new_df

dropna() can take arguments, for how you might want to decide to drop a row.

Insert missing dates

dt = pd.date_range("01-01-2017", "01-11-2017")
idx = pd.DatetimeIndex(dt)
df = df.reindex(idx)
df

COMMENTS


https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.filter.html
by david | 5 months, 1 week ago
https://github.com/codebasics/py/tree/master/pandas
by david | 5 months, 1 week ago
<iframe width="560" height="315" src="https://www.youtube.com/embed/CmorAWRsCAw" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
by david | 5 months, 1 week ago
This [Pandas Tutorial (Data Analysis In Python](https://www.youtube.com/playlist?list=PLeo1K3hjS3uuASpe-1LjfG5f14Bnozjwy) by **codebasics** is pretty rad. Check it out!





keep exploring!

back to all projects