LEARNING PANDAS

DATA MANIPULATION & ANALYSIS
0

PYTHON 3, PANDAS

MEDIUM

last hacked on Sep 18, 2018

Pandas

https://pandas.pydata.org/

What is pandas?

pandas is an open source, BSD-licensed library providing high-performance, easy to use, data structures and data analysis tools for the Python programming language.

Why pandas?

  • high-level data structures, DataFrame, for data manipulation with integrated indexing

  • tools for reading and writing data between in-memory data structures and different formats: CSV text files, SQL databases, HDF5 format, spreadsheets...

  • intelligent data allignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.

  • Flexible reshaping and pivoting of datasets

  • merging and joining of datasets

  • time-series functionality: date range generation and frequency conversion, moving window stats, moving window linear regressions, date shifting and lagging, etc


Implementing pandas

Install pandas

python3 -m pip install --upgrade pandas

Loading Data

import pandas as pd
df = pd.read_csv('./datasets/dataset.tsv', sep'\t') # df is now a pandas dataframe

Get datatype:

print(type(df))

Returns:

<class 'pandas'.core.frame.DataFrame>

Get count of rows and columns:

print(df.shape) # notice that shape is an attribute of df, and not a method

Returns:

(x , y)

where x is the amount of rows, and y is the amount of columns.

Get names of columns

print(df.columns)

Returns:

Index(['column_1', 'column_2', '...' ,'column_n'], dtype='object')

Get datatypes of each column

print(df.dtypes)

Returns:

column_1  object
column_2  int64
column_3  float64
...
column_n  object

COMMENTS







keep exploring!

back to all projects