Wenyan Deng
Ph.D. Candidate
Massachusetts Institute of Technology
Pandas and Plotly: Interactive Bubble Plots
July 22, 2017
The case study
If there are more than one variable, you might want to do a scatterplot of the two that changes over time. For instance, how might Sri Lanka's electoral turnout relate to the the Sri Lankan Army's fatalities by province? In this blog, I demonstrate how to make a bubble plot that reflects the number of registered voters, death count, and voter turnout. The layout of this site may mess up the loops in this set of codes, so make sure your indentations are correct! I've also posted a copy of the codes in .html format (and with correct loop indentation) on my GitHub repository.
The final product looks something like this: https://plot.ly/~wdeng1/132.embed.
Getting started
Get an account (free or otherwise) with plotly. Remember your username and write down your API key somewhere. Open a Jupyter Notebook and import the following:
import plotly.plotly as py
from plotly.grid_objs import Grid, Column
from plotly.tools import FigureFactory as figure_factory
import pandas as pd
import time
import plotly
import json
import requests
from requests.auth import HTTPBasicAuth
username = '...' # Replace with your username
api_key = '...' # Replace with your API key
auth = HTTPBasicAuth(username, api_key)
headers = {'Plotly-Client-Platform': 'python'}
The Data
Read your data file. Here, I combined my SLA deaths file, mentioned previously, with data on electoral turnout and number of registered voters, by electoral district. I call my file "gapminder" because it looks like a fake gapminder plot.
plotly.tools.set_credentials_file(username=username, api_key=api_key)
dataset = pd.read_excel("SLA_electoral.xls")
dataset.head()
table = figure_factory.create_table(dataset.head(10))
py.iplot(table, filename='animations-gapminder-data-preview')
Your dataset would look something like this, with year, deaths, district, province, registered voters, and turnout:
Plotting
Sort by year:
years_from_col = set(dataset['year'])
years_ints = sorted(list(years_from_col))
years = [str(year) for year in years_ints]
Make a list of provinces:
provinces = []
for province in dataset['province']:
if province not in provinces:
provinces.append(province)
Make the plotly grid:
columns = []
for year in years:
for province in provinces:
dataset_by_year = dataset[dataset['year'] == int(year)]
dataset_by_year_and_cont = dataset_by_year[dataset_by_year['province'] == province]
for col_name in dataset_by_year_and_cont:
column_name = '{year}_{province}_{header}_gapminder_grid'.format( year=year, province=province, header=col_name )
a_column = Column(list(dataset_by_year_and_cont[col_name]), column_name)
columns.append(a_column)
Upload the grid:
grid = Grid(columns)
url = py.grid_ops.upload(grid, 'gapminder_grid'+str(time.time()), auto_open=False)
Make the figure:
figure = { 'data': [], 'layout': {}, 'frames': [], 'config': {'scrollzoom': True} }
Fill in the layout:
figure['layout']['xaxis'] = {'range': [-10, 150], 'title': 'SLA Fatalities', 'gridcolor': '#FFFFFF'}
figure['layout']['yaxis'] = {'range': [-10, 100], 'title': 'Electoral Turnout (%)', 'gridcolor': '#FFFFFF'}
figure['layout']['hovermode'] = 'closest'
figure['layout']['plot_bgcolor'] = 'rgb(223, 232, 243)'
Make the slider and set values for the slider:
figure['layout']['slider'] = {
'args': [
'slider.value', {
'duration': 400,
'ease': 'cubic-in-out'
}
],
'initialValue': 'first-value-for-slider',
'plotlycommand': 'animate',
'values': [1988, 1989, 1994, 1999],
'visible': True
}
figure['layout']['slider'] = {
'args': [
'slider.value', {
'duration': 400,
'ease': 'cubic-in-out'
}
],
'initialValue': '1988',
'plotlycommand': 'animate',
'values': years,
'visible': True
}
figure['layout']['updatemenus'] = [
{
'buttons': [
{
'args': [None, {'frame': {'duration': 500, 'redraw': False},
'fromcurrent': True, 'transition': {'duration': 300, 'easing': 'quadratic-in-out'}}],
'label': 'Play',
'method': 'animate'
},
{
'args': [[None], {'frame': {'duration': 0, 'redraw': False}, 'mode': 'immediate',
'transition': {'duration': 0}}],
'label': 'Pause',
'method': 'animate'
}
],
'direction': 'left',
'pad': {'r': 10, 't': 87},
'showactive': False,
'type': 'buttons',
'x': 0.1,
'xanchor': 'right',
'y': 0,
'yanchor': 'top'
}
]
figure['layout']['sliders'] = {
'active': 0,
'yanchor': 'top',
'xanchor': 'left',
'currentvalue': {
'font': {'size': 20},
'prefix': 'text-before-value-on-display',
'visible': True,
'xanchor': 'right'
},
'transition': {'duration': 300, 'easing': 'cubic-in-out'},
'pad': {'b': 10, 't': 50},
'len': 0.9, 'x': 0.1, 'y': 0,
'steps': [{
'args': [
[1988],
{'frame': {'duration': 300, 'redraw': False},
'mode': 'immediate',
'transition': {'duration': 300}}
],
'label': "Year: 1988",
'method': 'animate'
}]
}
sliders_dict = {
'active': 0,
'yanchor': 'top',
'xanchor': 'left',
'currentvalue': {
'font': {'size': 20},
'prefix': 'Year:',
'visible': True,
'xanchor': 'right'
},
'transition': {'duration': 300, 'easing': 'cubic-in-out'},
'pad': {'b': 10, 't': 50},
'len': 0.9,
'x': 0.1,
'y': 0,
'steps': []
}
Set colors for the legend and define size reference for bubbles:
custom_colors = {
'Eastern': 'rgb(51, 153, 255)',
'Central': 'rgb(255, 51, 255)',
'Northern': 'rgb(153, 51, 255)',
'North Central': 'rgb(102, 178, 255)',
'Western': 'rgb(204, 153, 255)',
'Sabaragamuwa': 'rgb(255, 153, 255)',
'North Western': 'rgb(255, 102, 255)',
'Southern': 'rgb(178, 102, 255)',
'Uva': 'rgb(153, 204, 255)'
}
col_name_template = '{year}_{province}_{header}_gapminder_grid'
year = 1988
for province in provinces:
data_dict = {
'xsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='deaths')),
'ysrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='turnout')),
'mode': 'markers',
'textsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='district')),
'marker': {
'sizemode': 'area',
'sizeref': 1.5,
'sizesrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='regvoter' )), 'color': custom_colors[province]
},
'name': province
}
figure['data'].append(data_dict)
Plot:
frame = {'data': [], 'name': "1988"}
figure['layout']['sliders'] = [sliders_dict]
for year in years:
frame = {'data': [], 'name': str(year)}
for province in provinces:
data_dict = {
'xsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='deaths')),
'ysrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='turnout')), 'mode': 'markers',
'textsrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='district')),
'marker': {
'sizemode': 'area',
'sizeref': 1.5, 'sizesrc': grid.get_column_reference(col_name_template.format( year=year, province=province, header='regvoter')),
'color': custom_colors[province]
},
'name': province
}
frame['data'].append(data_dict)
figure['frames'].append(frame)
slider_step = {'args': [
[year],
{'frame': {'duration': 300, 'redraw': False},
'mode': 'immediate',
'transition': {'duration': 300}}
],
'label': year,
'method': 'animate'}
sliders_dict['steps'].append(slider_step)
figure['layout']['sliders'] = [sliders_dict]
graph = py.icreate_animations(figure, 'SLA_fatalities_turnout'+str(time.time()))
graph
We're done! See the completed plot below. Hover to see the labels. Drag on chart area to zoom, double click to zoom out. Drag on slider to see different years, or click play. Click on legend to isolate provinces. Happy plotting!