GEOSPATIAL DATA ANALYSIS: FACTORS OF HAPPINESS

CHOROPLETH PLOTTING USING PLOTLY
1

PYTHON

MEDIUM

last hacked on Feb 07, 2018

The purpose of this project is to investigate the effects of six variables on happiness within a country and visualize the results in an interesting and descriptive way. The data used comes from the 2015 World Happiness Report [(archived report)](http://worldhappiness.report/wp-content/uploads/sites/2/2015/04/WHR15_Sep15.pdf), a UN-commissioned undertaking intended to identify and analyze socioeconomic indicators of population wellness. The graphics and plots are built with the matplotlib and plotly packages.
# Factors of Happiness Data Visualization ### Install Packages ```python #basics: import pandas as pd import scipy import numpy as np import seaborn as sns #heatmap, stripplot from csv import DictReader #plots: import matplotlib.pyplot as plt import plotly.graph_objs as go #Choropleth/ Mixeed Subplot graphs from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot init_notebook_mode(connected=True) import plotly.plotly as py #choropleth map from plotly.graph_objs import * #machine learning: from sklearn.cross_validation import train_test_split from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error, r2_score from sklearn.model_selection import train_test_split ``` ### Read in Data After loading in packages, read in the data from a csv file and rename the columns. You can get the csv file here: [happy15.csv](https://github.com/adonovan7/WHR_DataVis/blob/master/happy15.csv) ```python happy = pd.read_csv('~/happy15.csv', ',') happy.columns=['country','region','rank', 'score','std_err','gdp', 'fam','life','free', 'gov','gen','dyst'] ``` ## Basic Exploratory Analysis ### Summary Statistics and Data Properties We can look at the first few observations and a few summary statistics by using the describe and head functions: ```python happy.head() ``` | obs | country | region | rank | score | std_err | gdp | fam | life | free | gov | gen | dyst | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 0 | Switzerland | Western Europe | 1 | 7.587 | 0.03411 | 1.39651 | 1.34951 | 0.94143 | 0.66557 | 0.41978 | 0.29678 | 2.51738 | | 1 | Iceland | Western Europe | 2 | 7.561 | 0.04884 | 1.30232 | 1.40223 | 0.94784 | 0.62877 | 0.14145 | 0.43630 | 2.70201 | | 2 | Denmark | Western Europe | 3 | 7.527 | 0.03328 | 1.32548 | 1.36058 | 0.87464 | 0.64938 | 0.48357 | 0.34139 | 2.49204 | | 3 | Norway | Western Europe | 4 | 7.522 | 0.03880 | 1.45900 | 1.33095 | 0.88521 | 0.66973 | 0.36503 | 0.34699 | 2.46531 | | 4 | Canada | North America | 5 | 7.427 | 0.03553 | 1.32629 | 1.32261 | 0.90563 | 0.63297 | 0.32957 | 0.45811 | 2.45176 | ```python happy.describe() ``` ```python print( 'Region & Number of Observations: \n') print(happy['region'].value_counts()) ``` ``` Region & Number of Observations: Sub-Saharan Africa 40 Central and Eastern Europe 29 Latin America and Caribbean 22 Western Europe 21 Middle East and Northern Africa 20 Southeastern Asia 9 Southern Asia 7 Eastern Asia 6 North America 2 Australia and New Zealand 2 ``` ### Introductory Data Visualization Next let's start building some plots to look at the relationships between different variables Here are a few definitions: 1. **Scatterplots:** Plots the relationship between two variables 2. **Correlation Matrix:** Plots strength of relationships (or correlation) between variables * Uses different colorshades or "heat" to represent strengths * Lighter color indicates a positive relationship, while the darker colors represent a negative or *inverse* relationship 3. **Strip Plot:** Plot of sorted response values on one axis for comparison between groups or categorical variables 4. **Choropleth Plots:** Plots geographical areas with varying colors and shadings to represent the measurement of some variable First, we will plot two scatterplots using GDP as an explanatory variable: ```python plt.scatter(happy['gdp'], happy['life']) plt.ylabel('Happiness Rank') plt.xlabel('Life Expectancy') plt.title('GDP and Life Expectancy Relationships') plt.show() plt.scatter(happy['gdp'], happy['rank']) plt.ylabel('Happiness Rank') plt.xlabel('GDP') plt.title('GDP and Rank Relationship') plt.show() ``` <img src='https://github.com/adonovan7/WHR_DataVis/blob/master/Images/image1.png?raw=true'> <img src='https://github.com/adonovan7/WHR_DataVis/blob/master/Images/image2.png?raw=true'> Next let's look at a correlation matrix for all of the variables. We set the method equal to Pearson to calculate the *Pearson's Correlation Coefficient*, which is the covariances divided by the product of the variables' standard deviations ```python happy.corr(method='pearson') corrmap = happy.corr() sns.heatmap(corrmap, vmax=.8, square=True) plt.title('Heatmap of Variable Correlations') plt.show() ``` <img src='https://github.com/adonovan7/WHR_DataVis/blob/master/Images/image3.png?raw=true'> And here is a strip plot for the ranks by region: ```python a = sns.stripplot(x="region", y="rank", data=happy, jitter=True) plt.xticks(rotation=90) plt.title('Strip Plot of Ranks by Region') plt.show() ``` <img src='https://github.com/adonovan7/WHR_DataVis/blob/master/Images/image4.png?raw=true'> ## Advanced Data Visualization ### Mapping Color Scale First we create a dictionary to map color codes to different scale marks: ```python scl = [[0.0, 'rgb(26,50,49)'],[0.2, 'rgb(52,100,98)'],[0.4, 'rgb(70,134,132)'],[0.6, 'rgb(137,193,191)'],[0.8, 'rgb(178,228,227)'],[1.0, 'rgb(238,246,246)']] ``` ### Flat Choropleth Plot by Happiness Score * Plotting a heatmap with geographic data (aka a choropleth map) using plotly * The darker the coloring, the higher the happiness score * plotted with a flat or 'cylindrical' map projection, which is fairly standard ```python meta_data = dict(type = 'choropleth', locations = happy['country'], colorscale = scl, reversescale = True, locationmode = 'country names', z = happy['score'], marker= dict(line=dict(color='black', width=1)), text = happy['country'], hoverlabel= dict(bgcolor= 'D68C24', font=dict(family='Times New Roman', color='white')), colorbar = {'title':'Happiness Score', 'nticks':9} ) layout = dict(title = 'Global Choropleth Plot By Happiness Score', geo = dict(showframe = True, projection = {'type': 'Mercator'})) My_choromap = go.Figure(data = [meta_data], layout=layout) iplot(My_choromap) ``` <iframe width="900" height="800" frameborder="0" scrolling="no" src="//plot.ly/~adonovan7/3.embed"></iframe> ## Multi-Plot Visualization * 3D globe with cloropleth mapping * Bar plot summarizing results and average score by region ```python choro_data = dict( geo= dict(projection=dict(type='orthographic')), type = 'choropleth', locations = happy['country'], #plots the countries on our map locationmode = 'country names', #determines set of locations to match our locations entrie (use country names for world map) z = happy['score'], #the variable we are measuring marker= dict(line=dict(color='black', width=1)), #country outlines text = happy['country'], #hover text hoverlabel= dict(bgcolor= 'D68C24', font=dict(family='Times New Roman', color='white'))) #changes our hover text style bar_data3 = Bar( x=happy['region'], y=happy['score'], marker=dict(color='rgb(12,80,110)', line=dict(color='green')) ) layout = { "plot_bgcolor": 'white', "paper_bgcolor": 'white', "titlefont": { "size": 45, "family": "Raleway" }, "font": { "color": 'black' }, "margin": { #margins around the entire box "r": 10, "t": 80, "b": 40, "l": 10 }, "showlegend": False, "title": "Regions By Happiness Score", "xaxis": { "anchor": "auto", "domain": [0.05, 0.5], "tickangle": "50" }, "yaxis": { "title": "Happiness Score", "anchor": "auto", "domain": [0.35, 0.95], "showgrid": False }, "dragmode": "zoom", "geo": { "domain": { "x": [0.55, 0.95], "y": [0.25, 0.95]}, "projection": {'type': "orthographic"}, "lakecolor": "rgba(127,205,255,1)", "oceancolor": "rgb(17,122,152)", "landcolor": 'white', "projection": {"type": "orthographic"}, "scope": "world", "showlakes": True, "showocean": True, "showland": True, "bgcolor": 'white' }} data3 = Data([choro_data,bar_data3]) fig3 = Figure(data=data3, layout=layout) iplot(fig3) ``` <iframe width="900" height="800" frameborder="0" scrolling="no" src="//plot.ly/~adonovan7/1.embed"></iframe>

COMMENTS







keep exploring!

back to all projects