Wenyan Deng
Ph.D. Candidate
Massachusetts Institute of Technology
GeoPandas: Static Maps and Spatial Join
July 20, 2017
The case study
My first major mapping project at the University of Chicago is a choropleth map of Sri Lankan Army (SLA) officer deaths between 1981 and 1999. The data is collected painstakingly by my colleague, Winston Berg. I am deeply grateful to Professor Paul Staniland, who made the project possible by hiring me. Any mistakes remain my own. You can see a copy of the codes here and the final products on my GitHub repository. The final product is shown in Figure 1.
The data
The data table includes the names of army officers who died as well as their place of death, which may be a village or city. In order to assign each place of death a larger geographical category — province, district, or division — we must attach a longitude and a latitude to each place in the dataset. Thankfully, GeoPandas has just the function.
import geopandas as gpd
import pandas as pd
df1=gpd.tools.geocode(['Ampara', 'Anuradhapura', 'Badulla', 'Batticaloa', 'Colombo', '...'])
df1['geometry'] = df1['geometry'].astype(str)
df2 = df1['geometry'].str.strip('POINT ( )').str.split(' ', expand=True).rename(columns={0:'Longitude',1:'Latitude'})
df3=df1.merge(df2, left_index=True, right_index=True) df3.to_csv("out.csv")
"out.csv" describes deaths data in which every place name is matched with a longitude and latitude.
Figure 1: Static map with GeoPandas
Basemap Shapefile
Go to http://www.diva-gis.org/gdata and download the shapefile for Sri Lanka. Province level, district level, and administrative division levels are available. A later blog will explain how to make shapefiles based on existing ones.
Get Mapping
Now that we have both the data and the shapefile base map, we can start mapping! In this example, I use Sri Lankan districts.
Open a jupyter notebook in your working directory and import the following:
%matplotlib inline
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
Read your shapefile into a geo-dataframe:
geo_df = gpd.read_file("data1/LKA_adm1.shp")
geo_df.set_index(geo_df["ID_1"].astype(int), inplace = True)
geo_df.head()
Plot it:
geo_df.plot()
Make a new column called coordinates, which is based on the "geometry" column:
geo_df['coords'] = geo_df['geometry'].apply(lambda x: x.representative_point().coords[:])
geo_df['coords'] = [coords[0] for coords in geo_df['coords']]
Label the districts:
print(geo_df.crs)
#try 2163 (albers), 3857 (web), 4269 (plate)
ax = geo_df.to_crs(epsg=4269).plot()
ax.set_axis_off()
for idx, row in geo_df.iterrows():
plt.annotate(s=row['NAME_1'], xy=row['coords'],
horizontalalignment='center')
Spatial Join
To make a heat/choropleth map, we have to count how many deaths there are per district. For this, we use GeoPanda's spatial join function. The spatial join function matches points in the deaths dataset to the polygons in the geo-dataframe. Here, I use "within". You can also use "contains " or "intersects".
from shapely.geometry import Point
deaths_df = pd.read_excel("SLA_RoH_Deng.xlsx", usecols = [8, 9])
deaths_df.dropna(inplace = True)
geometry = [Point(xy) for xy in zip(deaths_df.Longitude, deaths_df.Latitude)]
deaths_coords = gpd.GeoDataFrame(deaths_df, crs = geo_df.crs, geometry=geometry)
located_deaths = gpd.sjoin(deaths_coords, geo_df, how = 'left', op = 'within')
located_deaths = located_deaths.dropna(axis=1, how='all')
Plot it:
located_deaths.plot()
Rename the spatially joined table so the column heads make sense. Next, count and sum up the deaths in each district. Save it as an Excel sheet and check for formatting and errors:
located_deaths.rename(columns = {"NAME_1" : "Districts", "index_right" : "deaths_per_district"}, inplace = True)
deaths_geo_count = located_deaths.groupby("Districts").count()[["deaths_per_district"]] deaths_geo_count.to_excel("deaths_geo_count.xlsx")
Import the Excel sheet and draw a bar graph with Pandas:
deaths_geo_count = pd.read_excel("deaths_geo_count.xlsx")
deaths_geo_count.set_index("Districts")["deaths_per_district"].sort_values(ascending = False).plot(kind = "bar", figsize = (15, 3))
Rename the column in the geo-dataframe so it matches with the deaths dataframe:
geo_df.rename(columns = {"NAME_1" : "Districts"}, inplace = True)
Merge the geo-dataframe with the deaths dataframe:
geo_merge = geo_df.merge(deaths_geo_count, on='Districts', how='left')
geo_merge col_name =geo_merge.columns[0]
geo_merge.head()
Figure 2: SLA officer deaths by year
Now we can produce the map. The default for "scheme" is quantiles. You can also use "equal_interval" or "fisher_jenks". Fisher_jenks sets categories by minimizing in-group variance and maximizing inter-group variance.
ft = "deaths_per_district"
plate = geo_merge.to_crs(epsg=4269)
ax = plate.plot(column = ft, scheme = "quantiles", k = 9, cmap = "Reds", legend = True, alpha = 0.7, linewidth = 0.5, figsize = (50, 25))
ax.set_title("Sri Lankan Army Fatalities by District", fontsize = 30)
ax.set_axis_off()
for idx, row in geo_df.iterrows():
plt.annotate(s=row['Districts'], xy=row['coords'], horizontalalignment='center')
Your final product would look like Figure 1. "Alpha" in the codes above changes the shade of colors. Of course, you can also count the deaths by year and create a series of maps like Figure 2 shown on the left.
Happy mapping!