Project #1: Python Scripting for Automated Geoprocessing (or, How to Make Magic Happen)

When I first developed the syllabus for my senior research, I spent a lot of time asking fellow GIS students, professors, and analysts where they thought I should begin my study. And nearly everyone gave me the same answer – Python scripting.

Python scripting in GIS is primarily desirable because it allows you to automate almost all of the geoprocessing tasks you do in GIS without so much clicking. Rather than moving tediously from one GUI (Graphical User Interface, those pop up boxes where you tell a tool what files and parameters to use) to the next or attempting to build a flawless model from scratch, a Python script allows you to write code that is flexible, dynamic, and most importantly, adaptable to a variety of contexts. Let's say, for example, that you needed to project 1000 feature classes or run a particular technique on a group of shapefiles 500 times. Doing this manually would likely take days, if not weeks, of work. Conversely, a Python script allows you to direct the computer to a group of files and say, "Hey, could you bring these files into a GIS, run x, y, or z operations on them, and tell me what happens?" For me, Python scripts evoke a sensation of performing magic (particularly when layers you only conceived of creating actually start appearing on screen – ok, that's it for the meta stuff for now).

Another beautiful aspect of Python scripting is the ability to add a lot of creativity to your own geospatial analysis. Before I started scripting with Python, I didn't realize how much unseen functionality the folks at Esri and the QGIS Project had programmed into their geoprocessing tools, how many little details are taken care of for you. On the one hand, this is part of what makes GIS a lot more user friendly and approachable nowadays – you don't have to worry too much about the nitty gritty and you don't need to have a programming background to get started. However, I also think working exclusively with the built-in functionality of ArcGIS or QGIS confines you somewhat to very specific techniques or routines (which, ultimately, produces an inundation of very similar types of maps).

So, for my first project, I decided to tackle what felt like an ambitious project for a self-taught programmer like myself. Could I write a Python script that would make this map?

I made this map towards the end of last semester using data and a workflow courtesy of Professor Jeff Howarth in the Middlebury Geography Department. I did deviate in considerable ways from the original method for making this map, and dressed it up with my own cartography, but it was in many ways a standard GIS procedure for me. I used GUIs, Esri geodatabases, all of the built-in functionality I could get my hands on. The challenge now would be to see if I could do the same without so much support.

Fortunately, I did stumble upon a solid book to help me develop my scripting capabilities quickly, Python Scripting for ArcGIS by Paul A. Zandbergen. The book really focuses on using the ArcPy site package with ArcGIS to write effective, efficient code. It acts more as a conceptual skeleton than a How To guide, giving you the initial tools that you need to start exploring ArcPy and developing your own scripts. For this first project, I really wanted to stick with ArcPy because 1) it has a whole slew of solid methods and functions custom built for geoprocessing, 2) it works seamlessly with standard Python functions (i.e. you can nest 'regular' Python functions into larger ArcPy functions), and 3) there's well written, easy to use documentation available from Esri. At the same time, there is some dissonance between the stated goals of my research and the choice to use ArcPy. It isn't open source (it is built directly into ArcGIS, a very expensive piece of software), it has a good deal of custom functionality that still obscures what's going on "under the hood", and code produced using ArcPy can't be run by folks not using the Esri suite of products. And so I want to take a moment to acknowledge these benefits and drawbacks in my decision, and to recognize that this tension between convenience and accessibility is palpable in the GIS world today and demands further investigation. I will continue to do that investigation throughout this semester. Now, on to the results!

Click here to access the script I used to make this map. Note that this script is specific to the files and databases written to my personal drive at Middlebury. If you want to replicate this map, you will have to grab the data yourself (don't worry, I give you some sources where you can find it in the comments at the beginning of the script).

This script took a fair amount of troubleshooting to actually work through. ArcPy requires very specific syntax and does not like to perform operations on conflicting data types (trust me, I found out the hard way). This is true of most programming languages, but what is perhaps more difficult with ArcPy is that you have to do a fair amount of data type conversion and concatenation to actually get seemingly simple calculations to go through. It's also difficult to iterate through individual feature classes to do specific operations on each of them. You'll notice that I used multiple for loops with lists and indexes to take care of this problem; you could also use cursors, although I find these to be a bit more cumbersome.

Getting this script to work was a remarkably satisfying moment for me. The only items the script starts with are a .mxd file (ArcMap Document file), a shapefile of police beats for the city of Chicago, a shapefile of all Census block groups for the united States, and a .csv file (Comma Separated Value file) with XY coordinate data for arrests for possession of crack cocaine. This means that the script takes care of:

  • Setting Environments
  • Creating a file geodatabase
  • Performing a select by location to get block groups only for the city of Chicago
  • Making an XY Event Layer to display the crime data and writing this data to a new feature class
  • Converting all shapefiles to feature classes in a geodatabase
  • Projecting all feature classes
  • Performing a spatial join
  • Running an areal-weighted reaggregation to get the number of people of each race by police beat across the city (this is the entire second half of the script and encompasses about 8 separate operations...but more on this next post)

My hope is that in making this script available I can help others to write similar scripts that also take on the challenge of visualizing social injustice. While I am proud of the accomplishment, I am more excited about the story that this map tells - one of racially biased policing in a city that has seen too much violence. And so I am reminded that this thing I have made, this script, is only really as valuable as the story it helps to tell and the minds it helps to change. Below, you will find the map that this script produces (Note: you will have to do the symbology manually to get your map to look like mine).

And here is the code, fully written out, that made this map.

# NAME: AWR_Script
# Description: Run an areal-weighted reaggregation (AWR) on crime data and racial Census data for the city of Chicago given a .csv file of crimes, a shapefile for Census block groups, and a shapefile for CPD police beats.
# Author: Parker Ziegler

# Preparatory Steps
# Create a .mxd file to hold your map (i.e. "Chicago_Arrest_Map.mxd")
# Create a .csv file of the point data you're interested in. In this case, I used data showing the locations of arrests for crack cocaine possession in Chicago, courtesy of the Chicago Data Portal.
# You'll need to acquire shapefiles of the City of Chicago police beats as well as Census data. These can be accessed via the Chicago Data Portal and, respectively.
# Place all files and their metadata into your workspace folder.

# Import system modules.

import arcpy
import os

# Set environment workspace, enable file overwrite.

from arcpy import env
env.workspace = r"W:/GG700/Chicago_Arrests"
env.overwriteOutput = True
print "Env is set to Chicago_Arrests."

# Create a file geodatabase to store your data

out_folder_path = r"W:/GG700/Chicago_Arrests"
out_name = "Chicago_Arrest_Data.gdb"
arcpy.CreateFileGDB_management(out_folder_path, out_name)
print "Geodatabase created."

# Before starting analysis, we need to clean up our Census layer. In order to run the SelectLayerByLocation tool we first have to create a layer file from our Census shapefile.

arcpy.MakeFeatureLayer_management("block_groups.shp", "block_groups_lyr")
print "Census layer created."

# Perform a select by location on the Census data to get only the blocks groups that intersect the CPD police beats.

arcpy.SelectLayerByLocation_management("block_groups_lyr", "INTERSECT", "police_beats.shp")
print "Intersecting features selected."

# Write these block groups to a new shapefile.
arcpy.CopyFeatures_management("block_groups_lyr", "chicago_blk_grps.shp")
print "Features copied to shapefile."

# Display the Crack_XYData.csv as points and write a point feature class.
# Set variables.

in_table = "Crack_XYData.csv"
x_coords = "Longitude"
y_coords = "Latitude"
out_layer = "crack_as_points"
saved_layer = "crack_as_points.lyr"
sr = arcpy.SpatialReference("WGS 1984")

# Make the XY Event Layer

arcpy.MakeXYEventLayer_management(in_table, x_coords, y_coords, out_layer, sr)
print "XY Event Layer created."

# Create an output layer file (.lyr) file.

arcpy.SaveToLayerFile_management(out_layer, saved_layer)
print "Layer file created."

# Copy the layer file to a shapefile (.shp).

arcpy.CopyFeatures_management(saved_layer, "crack_as_points.shp")
print "Shapefile successfully created."

# Move your "crack_as_points.shp", your "police_beats.shp", and your "chicago_blk_grps.shp" into your file geodatabase.

in_features = ["crack_as_points.shp", "police_beats.shp", "chicago_blk_grps.shp"]
out_gdb = r"W:/GG700/Chicago_Arrests/Chicago_Arrest_Data.gdb"
arcpy.FeatureClassToGeodatabase_conversion(in_features, out_gdb)
print "Shapefiles have been converted to feature classes in the geodatabase."

# Now, get all your feature classes into the same spatial reference.

env.workspace = out_gdb
infcList = arcpy.ListFeatureClasses()
outfcList = ["chicago_blk_grps_project","crack_as_points_project", "police_beats_project"]
outCS = arcpy.SpatialReference("NAD 1983 UTM Zone 16N")
i = 0
for infc in infcList:
arcpy.Project_management(infcList[i], outfcList[i], outCS)
i += 1

print "All layers are in the same coordinate system."

# By now, your three features classes should be in the same coordinate system. Now, perform a spatial join to get the number of crimes in each police beat.

# Join "crack_as_points_project" (points) to "police_beats_project" (polygons).
# Define variables.

target_features = "police_beats_project"
join_features = "crack_as_points_project"
outfc = "police_beats_crack_joined"

# Run the Spatial Join tool.

arcpy.SpatialJoin_analysis(target_features, join_features, outfc)
print "Crack points have been joined to police beats."

# Performing a spatial join will automatically create a field labeled "Join_Count"; use this field, as well as area information from your police beats feature class, to create a new field that gives arrests / sq. km.

# Add fields for police beat geometry and crime density.

arcpy.AddField_management("police_beats_crack_joined", "area_beats", "FLOAT", 9)
arcpy.AddField_management("police_beats_crack_joined", "arrest_dns", "FLOAT", 9)
print "Two fields have been added."

# Calculate geometry, use this value to calculate a crime density for each police beat.

arcpy.CalculateField_management("police_beats_crack_joined", "area_beats", '!shape.geodesicArea@squarekilometers!', "PYTHON_9.3")
arcpy.CalculateField_management("police_beats_crack_joined", "arrest_dns", 'float(!Join_Count!)/float(!area_beats!)', "PYTHON_9.3")
print "Crime density field has been calculated."

# Write this joined value to a new feature class, as joins are stored only in memory.

arcpy.CopyFeatures_management("police_beats_crack_joined", "arrest_density")
print "Arrest Density feature class written to geodatabase."

# You have now successfully created the choropleth layer representing arrest density for possession of crack cocaine by CPD police beat. The next part of the workflow involves performing an areal weighted reaggregation on our Census data to obtain the number of people living in each police beat by race.

# Calculate the area of each Census block group.

arcpy.AddField_management("chicago_blk_grps_project", "a_blk_grps", "DOUBLE", 10)
arcpy.CalculateField_management("chicago_blk_grps_project", "a_blk_grps", '!shape.geodesicArea@squarekilometers!', "PYTHON_9.3")
print "Area of each block group calculated."

# Intersect the Census block groups with the police beats.

in_features = ["chicago_blk_grps_project", "police_beats_project"]
out_features = "blk_grps_beats_intersect"
arcpy.Intersect_analysis(in_features, out_features)
print "Intersect completed."

# Now, create a field that will divide the areas of the intersected features by the areas of the Census block groups.

arcpy.AddField_management(out_features, "area_ratio", "DOUBLE", 10)
arcpy.CalculateField_management(out_features, "area_ratio", '!shape.geodesicArea@squarekilometers!/!a_blk_grps!', "PYTHON_9.3")
print "Area ratio calculated."

# Next, multiply this ratio by the number of Black, White, Hispanic, Asian, and Multiple Race people in each Census block group. This will give us the number of people of each race in each intersect feature.

# Add five fields, one for each race.

fieldList = ["black_pop", "white_pop", "hisp_pop", "asian_pop", "mult_pop"]
field_type = "DOUBLE"
field_length = 10
for field in fieldList:
arcpy.AddField_management(out_features, field, field_type, field_length)
print "All fields for racial data added."

# Calculate the values for all five fields using the ratio created above and the Census data included from the intersect.

i = 0
for field in fieldList:
arcpy.CalculateField_management(out_features, fieldList[i], '' + '!' + calcList[i] + '!' + '*!area_ratio!', "PYTHON_9.3")
i += 1
print "All fields for racial data calculated."

# The last step in the AWR is to dissolve our intersected features along the boundaries of the police beats. This will reaggregate our racial data by police beats, which we will represent using a dot density.

# Define variables.
in_features = "blk_grps_beats_intersect"
out_features = "race_by_beats"
dissolve_field = "BEAT_NUM"
stats_fields = "black_pop SUM; white_pop SUM; hisp_pop SUM; asian_pop SUM; mult_pop SUM"

# Run the Dissolve tool.

arcpy.Dissolve_management(in_features, out_features, dissolve_field, stats_fields)
print "Dissolve completed."

# The script is now complete. The final map should be composed of two layers – "arrest_density" (a choropleth density of arrests for possession of crack cocaine) and "race_by_beats" (a dot density representation of the racial make up of Chicago). You may also want to add a base map depending on your purposes.
# Use the symbology GUI to change the symbology and set boundaries as you wish. I recommend doing most of your design work in Adobe Illustrator.