Mapping in React and D3

A lot of folks today are talking about React, the Javascript framework that lets you build performant apps and UIs quickly and efficiently. I use React almost everyday at my job and in my own open source projects, often pairing it with Redux for state management.

But as a cartographer, my ultimate assessment of any framework comes down to its ability to help me make beautiful, engaging maps and visualizations. I've worked quite a bit with the ArcGIS Javascript API in React to make slippy maps, but I haven't fallen in love with it. D3, on the other hand (which ranks among my favorite libraries out there), is a lot more fun to play with in React. It's also a good bit more difficult to work with because of how differently it interacts with the DOM. While React uses a virtual DOM model to parse UI updates and render the proper HTML as a function of props and state, D3 just operates directly on the DOM, adding and removing elements as you please. This makes D3 quite a bit more opinionated than React. In this piece, I'll go over the two main strategies I've found for working with, and mapping with, D3 and React.

Before we get started, please take a read of one of my favorite pieces on this subject, which will be particularly useful for those of you who are new to React. Done? No really, go read it. It'll help you understand what is to follow. Sweet, on to the good stuff!

When mapping with D3 and React, I like to say that you can either write React code the D3 way or write D3 code the React way. By this I mean to say that you need to let one of these frameworks drive how you handle the DOM in any given situation. Of course, just as nothing in life is an either / or (more often, it's an &), this doesn't preclude you from using both techniques in a single app – on the contrary, I think these two methods are suited to solving different problems. Let's look at two examples, starting with writing React code the D3 way. Our task will be rendering a simple legend for a choropleth map. Let's take a look.

// import React
import * as React from 'react';
// import d3 - run npm install --save d3 to add
// d3 to your node_modules
import * as d3 from 'd3';

class MapLegend extends React.Component {
  
  constructor() {
    super();
    // props, bound methods of your React component
    // etc. can go in here
  }
  
  componentDidMount() {
    // time to make a legend here
    // start by storing all color values in an array
    let colorLegend = ["rgb(247,251,255)", "rgb(222,235,247)", "rgb(198,219,239)", "rgb(158,202,225)", "rgb(107,174,214)", "rgb(66,146,198)", "rgb(33,113,181)", "rgb(8,81,156)"];

    // get the svg that just mounted - this is componentDidMount()
    // so this function gets fired just after render()
    let svgLegend = d3.select('#legend');

    // now just inject standard D3 code
    svgLegend.append("g")
        .attr("transform", "translate(500, 525)")
        .selectAll("rect")
        .data(colorLegend)
        .enter()
        .append("rect")
        .attr("fill", function(d, i){ return colorLegend[i]; })
        .attr("x", function(d, i){ return (i*30); })
        .attr("y", 30)
        .attr("width", 30)
        .attr("height", 20);

    // add a title
    svgLegend.append("text")
        .attr("transform", "translate(500, 525)")
        .attr("font-size", "12px")
        .attr("font-family", "HelveticaNeue-Bold, Helvetica, sans-serif")
        .attr("y", 20)
        .text("Shootings Per Million");

    // add numbers as labels
    let labelsLegend = ["0-1","1-3","3-5","5-7","7-10","10-12","12-15",">15"];
    
    svgLegend.append("g")
        .attr("transform", "translate(500, 525)")
        .selectAll("text")
        .data(labelsLegend)
        .enter()
        .append("text")
        .attr("font-size", "10px")
        .attr("font-family", "HelveticaNeue-Light, Helvetica, sans-serif")
        .attr("x", function(d, i){ return (i*30); })
        .attr("y", 60)
        .text(function(d){ return d; })
  }
  
  render() {
    return <svg id='legend'></svg>;
  }
}

export default MapLegend;

So let's discuss what's going on here. Essentially, we are just using React to render a blank SVG container for us, rather than leaving that to D3. This is useful because it turns this SVG into a component, which – using ES6 classes – has lifecycle methods we can hijack to do anything we want. In this case, using componentDidMount() is the proper choice because it guarantees that the empty SVG is in the DOM before we execute any D3 code to act on it. The contents of the componentDidMount() method should look familiar to anyone who has worked with D3. Here we simply obtain a reference to our SVG in the DOM, append a <g> tag to house the <rect>'s that will make up our legend, add the proper color and text label to each rectangle, and add a title to the legend. Pretty simple, but pretty powerful. I call this writing React code the D3 way because we're really letting D3 drive the way this component renders. React is essentially just providing us an SVG shell into which D3 can put anything we want. Of course, since this legend is now a component, we can generalize it – say by turning the colorLegend or labelsLegend arrays into props that can render anything we hand off to the component.

The other way of doing things is what I like to call writing D3 code the React way. Taking this approach, React takes the reins of explicitly rendering things, to the point where <path> or <circle> elements become components. Let's take a look at how to render a simple map of the United States using this approach.

// import react
import * as React from 'react';
// be sure to import d3 - run npm install --save d3 first!
import * as d3 from 'd3';
// also import topojson so we work with topologically clean geographic files
// this will improve performance and reduce load time
// npm install --save topojson
import * as topojson from 'topojson'
// also import lodash
import * as _ from 'lodash';
// and import a custom component called State
// we'll get to this a bit later
import State from './State';

class Map extends React.Component {

  constructor() {
    super();
    this.state = { us: [] };
    this.generateStatePath = this.generateStatePath.bind(this);
  }
  
  componentDidMount() {
    // send off a request for the topojson data
    d3.json("https://d3js.org/us-10m.v1.json", (error, us) => {
      if (error) throw error;
      
      // store the data in state
      this.setState({
        us
      });
      
      // I prefer to use Redux for state management - if you're taking
      // that approach this would be a good place to dispatch an action, i.e.
      this.props.dispatch(actions.sendAPIDataToReducer({ us }));
    });
  }
  
  // let's define a method for rendering paths for each state
  generateStatePath(geoPath, data) {
    const generate = () => {
      let states = _.map(data, (feature, i) => {
        
        // generate the SVG path from the geometry
        let path = geoPath(feature);
        return <State path={path} key={i} />;
      }):
      return states;
    }
    
    let statePaths = generate();
    return statePaths;
  }
  
  render() {
    
    // create a geographic path renderer
    let geoPath = d3.geoPath();
    let data = this.state.us // or, the reference to the data in the reducer, whichever you are using
    
    // call the generateStatePaths method
    let statePaths = this.generateStatePaths(geoPath, data);
    
    return (
        <svg id='map-container'>
            <g id='states-container'>
                {statePaths}
            </g>
        </svg>
    ); 
  }
}

Alright, so let's dig into this piece of code. Here we are actually generating the JSX for our states inside of a separate method on our component called generateStatePath. You'll notice generateStatePath takes two arguments - geoPath and data. geoPath is just D3's geographic path generator. It what allows us to work dynamically with things like projected coordinate systems in D3 (which is pretty darn sweet for those of us who've spent many a day in hell reprojecting static maps in ArcGIS). The second argument is our data - in this case, our data is a 10m resolution topojson file of every state in the United States, which we fetch using the d3.json method and store in the component's state. Notice I make a comment about using Redux to store the topojson data rather than the component's state. The case for this is that your topojson is available to any component in your application, which becomes very useful for doing analysis or making different maps with the same root data.

The generateStatePath function then simply maps over every feature in the dataset, returning a State component for each. The result is an array of State components, which get rendered inside our <g></g> tag. Let's check out what this State component looks like.

import * as React from 'react';

const State = (props) => {
  
  return <path className='states' d={props.path} fill="#6C7680"
  stroke="#FFFFFF" strokeWidth={0.25} />;
};

export default State;

Our State component is really just an SVG path that gets rendered using the path prop handed down by our parent Map component. That path gets generated in the call to d3.geoPath(feature), which takes the input state feature as a GeoJSON object and outputs an SVG path that can be rendered on our page. You'll notice here we're applying the Stateless Functional Component paradigm to our component. Rather than using an ES6 class, we define our component as a function, which simply returns some JSX based on the props we provide it as an argument.

Here I suggest we're righting D3 code the React way because we're applying React's component hierarchy philosophy to rendering our D3 elements. What I appreciate about this approach is the ability to control the rendering of HTML elements at a very precise level – down to the point where we can write custom functions to handle all sorts of UI scenarios. These elements are now also subject to React's diffing algorithm, meaning that updates to data stored in Redux or in the component's state will trigger updates to these elements on the UI. With something like react-transition-group, you can begin to implement all of the beautiful magic behind D3 transitions using React!

If you're curious to see some of this code in action, check out this Github repo. This is an app I've been developing to help make sense of crowdsourced police shootings data gathered by the Guardian. An older, pure D3 / JS version of the app can be seen on my portfolio or here. I wanted to test myself to see if I could develop the same (or better) experience in React, using other cool technologies like Sass and redux-sagas. In my next post, I'll talk a bit about techniques for responsive typography in map focused layouts using Sass maps. More to come as well on async data loading using redux-sagas and axios. Thanks as always for reading.

Project #9: Building Interactive D3 Dashboards with CARTO Web Maps

This post comes from a piece I wrote as part of Azavea's Summer of Maps program. To read the original post, head to the Azavea website.

Dashboards are all the rage these days. The talented folks at Stamen and Tableau have shown just how powerful a well-designed dashboard can be for integrating statistical and geospatial stories on a single web page. I wanted to see if I could emulate their work in an interactive web application I’ve been developing for Transportation Alternatives that explores correlations between traffic crashes and poverty in New York City. But rather than using ready-made software like Tableau, I wanted to build and design the application through code using two powerful open-source tools – CARTO and D3. This post will go through the design process I used for building my dashboard, highlighting some of the key pieces of code needed to get CARTO and D3 talking to one another.

Building from scratch means writing a good bit of code, pulling mostly from the CARTO.js and D3.js libraries. This post does not cover these libraries in depth, but if you’re looking to learn more check out CARTO’s Map Academy or Scott Murray’s D3 tutorials. This post also does not cover basic HTML and CSS, which are needed to add elements and styles to your web page. Learn more about both of these languages from the World Wide Web Consortium (W3C).

Adding Dropdown Menus

To design my dashboard, I began by thinking carefully about user experience. Only essential information should be present to the viewer at any one time. Moreover, it should be easy to navigate. Too many options or poorly labeled graphics confuse even savvy users. For this reason, I placed all layer selectors and zoom selectors into two clean dropdown menus using HTML’s <select> and <option> tags.

<div id="dashboard">
    <div id="dashboard-title-container">
        <div id="dashboard-title">EXPLORE DATA ON NYC <br> CITY COUNCIL DISTRICTS</div>
  </div>
  <div id="selector_container">
    <div id="selector_menu">
        <form>
            <div style="display: inline-block;">
                <label for="layer_selector" style = "padding-bottom: 5px;">Select a Visualization</label>
                <select id="layer_selector" class="selector" style="margin-right: 10px;"
 <option value="intensity" data="torque" data-type="cartocss">Heat Map of Crashes</option>
                    <option value="area" data="choropleth-area" data-type="cartocss">Crahes per Square Mile</option>
                    <option value="popdensity" data="choropleth-density" data-type="cartocss">Population Density</option>
                    <option value="medincome" data="choropleth-bubble" data-type="cartocss">Median Income</option>
                    <option value="povrate" data="choropleth-povrate" data-type="cartocss">Poverty Rate</option>
                    <option value="unemprate" data="choropleth-umemployment" data-type="cartocss">Unemployment Rate</option>
                </select>
            </div>
            <div style="display: inline-block;">
                <label for="zoom_selector" style = "padding-bottom: 5px;">Zoom to a Borough</label>
                <select id="zoom_selector" class="selector">
                    <option value="bronx">Bronx</option>
                    <option value="brooklyn">Brooklyn</option>
                    <option value="manhattan">Manhattan</option>
                    <option value="queens">Queens</option>
                    <option value="statenisland">Staten Island</option>
                </select>
            </div>
        </form>
    </div>
  </div>
  <div id="d3-elements">
  </div>
</div>
The result of the above HTML – two clean, user-friendly dropdown menus.

The result of the above HTML – two clean, user-friendly dropdown menus.

Dropdown menus are a powerful design choice because they are familiar to a lot of users and can store a lot of options that would take up too much space otherwise. Getting these buttons to do something just involves a bit of jQuery directing what should happen when each option is selected. For example, in the code below, when someone chooses a borough from the zoom selector, the map should pan to that borough.

var ZoomActions = {
  bronx: function(){
    var bronxXY = [[40.893719, -73.918419],[40.797624, -73.853531]] 
    map.fitBounds(bronxXY);
  },
  brooklyn: function(){
    var brooklynXY = [[40.695664, -74.048538],[40.581554, -73.8580653]] 
    map.fitBounds(brooklynXY);
  },
  manhattan: function(){
    var manhattanXY = [[40.804641, -73.968887],[40.709719, -73.978157]] 
    map.fitBounds(manhattanXY);
  },
  queens: function(){
    var queensXY = [[40.808, -73.974],[40.691499, -73.752]] 
    map.fitBounds(queensXY);
  },
  statenisland: function(){
    var statenislandXY = [[40.6534, -74.2734],[40.498512, -74.128189]] 
    map.fitBounds(statenislandXY);
  }
};

$('#zoom_selector').change(function(){
    ZoomActions[$(this).val()]();
});
Notice how we changed the dropdown to Brooklyn and we are now zoomed into downtown Brooklyn. jQuery is a powerful framework for handling DOM-Javascript interaction.

Notice how we changed the dropdown to Brooklyn and we are now zoomed into downtown Brooklyn. jQuery is a powerful framework for handling DOM-Javascript interaction.

Adding Graphics with D3

The trickier part of the dashboard, of course, is actually adding the good stuff – graphics that respond interactively to user input. To get a sense of what’s possible with D3, check out this gallery. I decided to stick with two classics for my dashboard – the scatterplot and the bar chart.

The scatterplot is particularly powerful as a simple, intuitive graphic that shows relationships between two variables – say, the poverty rate of NYC city council districts against the density of traffic crashes. Making one with D3 involves four major steps: 1) loading data and setting up scales, 2) positioning your data, 3) making some axes, and 4) adding animation. This same basic workflow goes for bar charts as well.

Loading Data and Setting Up Scales

Loading data is fairly straightforward in D3 – just choose the function that matches your file type. I loaded data stored in a GeoJSON using the d3.json function, but you could load a CSV, for example, using d3.csv. Once your data is loaded, you can begin to set up scales. Scales can take a while to conceptualize and get comfortable with, but the basic premise is to take a domain of real world data (i.e. median incomes ranging from $30,000 – $120,000) and scale it across a range of pixel values (i.e. over 300px on your monitor). Scales do this math for you and allow your data to be handled dynamically even as it grows. Here’s all the code you need to set up a basic linear scale:

// set the width, height, and margins of your svg container
var margin = {top: 35, right: 35, bottom: 35, left: 35},
    w = 300 - margin.left - margin.right,
    h = 225 - margin.top - margin.bottom;

// create the x scale, ranging from 0 to the maximum value in our dataset
var xScale = d3.scale.linear()
    .domain([0, d3.max(dataset, function(d){
        return d.properties.pop_density;
    })])
    .range([0, w]);

// create the y scale, ranging from 0 to the maximum value in our dataset
var yScale = d3.scale.linear()
    .domain([0, d3.max(dataset, function(d){
        return d.properties.crash_area;
    })])
    .range([h, 0]);

Positioning Your Data

With data and scales in hand, we now need a place to put it. This is where scalable vector graphics (SVG) come in. For now, think of SVG as the container that holds our data; or, for the more artistic types, as our canvas. The key thing to know right off the bat with SVG is that the (0,0) point, from which all attributes are referenced, is in the top left of the container – that’s your starting point. Positioning data, then, is as simple as using the scales you set up above to give your data the appropriate (x, y) coordinates based on their values. You can do this using these few lines of code:

// create the scatterplot svg
var svg = d3.select("#d3-elements") // finds the dashboard space set aside for d3
    .append("svg") // adds an svg
    .attr("width", w + margin.left + margin.right) // sets width of svg
    .attr("height", h + margin.top + margin.bottom) // sets height of svg
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");

// make a scatterplot
svg.selectAll("circle")
    .data(dataset) // gets the data
    .enter()
    .append("circle") 
    .attr("cx", function(d){
        return xScale(d.properties.pop_density); // gives our x coordinate
    })
    .attr("cy", function(d){
        return yScale(d.properties.crash_area); // gives our y coordinate
    })
    .attr("r", function(d){
        return 4; // sets the radius of each circle
    })
    .attr("fill", "#1DFF84")
    .attr("opacity", 0.5);
Look at that beautifully positioned data! Applying 50% transparency to the points allows overlapping data to be highlighted.

Look at that beautifully positioned data! Applying 50% transparency to the points allows overlapping data to be highlighted.

Making Some Axes

So we have our data, it’s scaled and positioned, but we can’t actually understand the values they represent because we have no axes. Fortunately, D3 axes are a breeze. Essentially, we just call d3.svg.axis(), feed it our x and y scales, set a few styling parameters, and tack it onto the svg like so:

var xAxis = d3.svg.axis()
    .scale(xScale) // feed the xAxis the xScale
    .orient("bottom")
    .ticks(5);

var yAxis = d3.svg.axis()
    .scale(yScale) // feed the yAxis the yScale
    .orient("right")
    .ticks(5);

// append the x axis to the svg
svg.append("g") // creates a group so we can store lines and tick marks together
    .attr("class", "x axis")
    .attr("transform", "translate(0," + h + ")")
    .call(xAxis);

// append the y axis to the svg
svg.append("g") // creates a group so we can store lines and tick marks together
    .attr("class", "y axis")
    .attr("transform", "translate(0,0)")
    .call(yAxis);
Adding axes and a title helps to make the data readable and understandable. In this case, we see that there is a strong positive correlation between population density and crash density. But what about other variables.

Adding axes and a title helps to make the data readable and understandable. In this case, we see that there is a strong positive correlation between population density and crash density. But what about other variables.

Adding Animation

At this point, we have a beautiful scatterplot, but the data is still static. The whole point of using D3 is to create graphics that change dynamically. That’s where transitions come in. Transitions are exciting, lively, and make your data come to life. We can also wire transitions in the graphics to changes in map layers, such that we get two views of the same data at once.

Notice that we have the Poverty Rate layer selected in the dropdown, the Poverty Rate layer is showing on the map, and the scatterplot is showing the relationship between Poverty Rate and Crash Density. Ah, the beauty of Integrated data views.

Notice that we have the Poverty Rate layer selected in the dropdown, the Poverty Rate layer is showing on the map, and the scatterplot is showing the relationship between Poverty Rate and Crash Density. Ah, the beauty of Integrated data views.

Fortunately, D3 makes transitions easy to implement. The general idea is to bind the new dataset (the values we’re transitioning to), update our scales to match this dataset, start a transition for a set duration of time, and move the circles to their new positions. Within the Javascript function that controls the dropdown menu, just add a few lines of code:

// update the x scale, no need to update the y scale if the values don’t change.
var xScale = d3.scale.linear()
    .domain([0, d3.max(dataset, function(d){
        return d.properties.pop_density;
    })])
    .range([0, w]);

// update your dataset
var dataset = data.features; // new data here

// update the circles
svg.selectAll("circle") // select all the data points
    .data(dataset) // bind the new dataset
    .transition() // transition!
    .duration(1500) // for 1500 milliseconds, or 1.5 seconds
    .attr("cx", function(d){
        return xScale(d.properties.pop_density); // get new x positions
    })
    .attr("cy", function(d){
        return yScale(d.properties.crash_area); // get new y positions
    })
    .attr("r", function(d){
        return 4;
    })
    .attr("fill", "#1DFF84")
    .attr("opacity", 0.5);

Adding transitions gets a bit more complicated when the size of your dataset changes. Doing that requires diving more deeply into D3’s enter()and exit() functions and thinking more carefully about data joins. When in doubt, check out some of the examples by D3 creator Mike Bostock.

Bringing It All Together

By tying your D3 interactivity to your CARTO layer selector, your dashboard and your map can talk to one another. They help to tell the same story in different ways, pointing at nuances in the data that fit the needs of multiple users all in a simple, clean interface. To access the entire source code for this example, head over to this Summer of Maps Github repo. To play with the app yourself, head here. Happy dashboarding!

Project #8: Using NAIP Imagery and a Texture Raster to Model the Urban Forest

This post comes from a piece I wrote as part of Azavea's Summer of Maps program. To read the original post, head to the Azavea website.

As high-resolution satellite imagery becomes more accessible by the day, geographers, urban planners, and ecologists have begun to explore its potential for modeling the urban forest. However, attempts at precise urban canopy classifications often run into a few predictable problems. 30m or even 15m resolution still cannot model individual street trees or isolated stands. Classification algorithms confuse grass and other scrubby vegetation for the canopy. Shadows cast by adjacent trees affect spectral returns, leading to further misclassification. In short, mapping the urban forest in an accurate, detailed, and inexpensive way remains a challenge.

This post walks through a method to help increase accuracy of classification by using a texture raster to distinguish between trees and other vegetation.

What is NAIP imagery?

In 2012, Chris Behee, a GIS Analyst for the City of Bellingham, Washington, introduced a fascinating methodology for working around many of these difficulties in a presentation at the annual Washington GIS Conference. His methodology takes advantage of a rich multispectral imagery repository – the National Agricultural Imagery Program (NAIP).

For those unfamiliar with NAIP imagery, it is collected (as of 2007) in four bands – the red (R), green (G), and blue (B) visible bands, and the near-infrared (NIR) band – at 1m resolution for the conterminous United States.

This makes it some of the highest resolution multispectral imagery that is publicly available today.

A composited NAIP image of the city of Portland, OR in 2014 displayed in true color (R = Band 1, G = Band 2, B = Band 3).

A composited NAIP image of the city of Portland, OR in 2014 displayed in true color (R = Band 1, G = Band 2, B = Band 3).

A composited NAIP image of the city of Portland, OR in 2014 displayed in    color-infrared   &nbsp;(R = Band 4, G = Band 3, B = Band 2).

A composited NAIP image of the city of Portland, OR in 2014 displayed in color-infrared (R = Band 4, G = Band 3, B = Band 2).

I decided to apply Behee’s methodology to my current work with Ecotrust, a Portland, Oregon-based non-profit. The project aims to use high-resolution imagery to:

  1. model change to Portland’s urban canopy over the last decade; and,

  2. explore correlations between the distribution of the urban canopy and socioeconomic and environmental factors.

Generating Classification Inputs

Behee’s methodology is based on using the iso cluster unsupervised classification algorithm with six different raster inputs – the four bands of the NAIP image, a Normalized Differentiated Vegetation Index (NDVI) raster, and a ‘texture’ raster. These last two layers are the real key to the classification. The NDVI raster is particularly adept at distinguishing between vegetated and non-vegetated surfaces. It does this by exploiting the tendency for vegetation to reflect strongly in the NIR band compared with other land cover types. However, the NDVI raster does not distinguish between trees and other scrubby vegetation or deal with shadows. This is where the ‘texture’ raster comes in.

The NDVI of the NAIP composite displayed using a diverging color ramp. You can use the Image Analysis toolbar to automatically generate an NDVI from an input raster or manually calculate it using Raster Calculator.

The NDVI of the NAIP composite displayed using a diverging color ramp. You can use the Image Analysis toolbar to automatically generate an NDVI from an input raster or manually calculate it using Raster Calculator.

The NDVI of the NAIP composite reclassified using a threshold. Remember that this layer gives a good indication of all vegetated surfaces, but does not distinguish trees from grass.

The NDVI of the NAIP composite reclassified using a threshold. Remember that this layer gives a good indication of all vegetated surfaces, but does not distinguish trees from grass.

The Texture Raster

The idea here is that rather than relying solely on the spectral information of each individual cell in our raster, we should take into account the spectral information of the ‘neighborhood’ around the cell. If the surrounding cells have similar values, we can assume they are reflecting light similarly and that their texture is fairly consistent (like a mowed grassy field). But if they display greater variation in their values, we can assume they are reflecting light somewhat differently and that their texture is rougher (like the crown of a mature tree). In this way, the ‘texture’ raster provides information about not only color, but also shape.

To obtain this final layer, we perform a Focal Statistics operation on both the G band and the NIR band of our NAIP image with a Range statistic and a 7 x 7 window. At 1m resolution, a 7 x 7 window corresponds roughly to the area of a mature tree crown. We finish by averaging the values of these two rasters together using Raster Calculator.

The ‘texture’ raster of the NAIP image. Notice how the Willamette River (center) appears very deep blue (low value, smooth surface) while Forest Park (left) appears a tan brown (higher value, rougher surface).

The ‘texture’ raster of the NAIP image. Notice how the Willamette River (center) appears very deep blue (low value, smooth surface) while Forest Park (left) appears a tan brown (higher value, rougher surface).

A zoomed in view of the ‘texture’ raster and the original image. Notice how flat, grassy surfaces appear blue (low values), while the canopy appears tan brown (higher values).

A zoomed in view of the ‘texture’ raster and the original image. Notice how flat, grassy surfaces appear blue (low values), while the canopy appears tan brown (higher values).

Running the Classification Algorithm

Loading these six inputs into my classification yielded some pretty exciting results. Obtaining a suitable level of precision did require requesting more classes than needed in the original unsupervised algorithm. I started with 50 classes and then manually recombined them to form a single canopy layer through visual inspection. I also engaged in several post-classification smoothing techniques, including applying a Majority Filter to take care of small holes in canopied areas and using a thresholded ‘texture’ raster as a mask to more precisely pick out trees in shadowy areas.

The output canopy layer from the NAIP imagery after post-classification smoothing and masking. Notice there are still a few holes in canopied areas of Forest Park – those pesky shadows!

The output canopy layer from the NAIP imagery after post-classification smoothing and masking. Notice there are still a few holes in canopied areas of Forest Park – those pesky shadows!

The output canopy layer overlaid on the original image (top) and the original image (bottom) for comparison. The classification lines up nicely with the canopy, although it does tend to overestimate crown boundaries in certain places. It also still suffers from some difficulties distinguishing shadows.

The output canopy layer overlaid on the original image (top) and the original image (bottom) for comparison. The classification lines up nicely with the canopy, although it does tend to overestimate crown boundaries in certain places. It also still suffers from some difficulties distinguishing shadows.

Classification Results

Ultimately, I found that Behee’s methodology, in conjunction with some post-classification smoothing and masking, created a pretty impressive canopy layer. There were lingering issues with detecting trees in shadowed parts of the image, which remains the major drawback of this approach. But compared with the more labor intensive process of gathering training data for a supervised classification, this method is far more lightweight and efficient, and works well for obtaining quick and precise maps of the urban forest.

You also don’t have to be a remote sensing expert to get it up and running! To obtain NAIP data for your state or city, head here or search for individual tiles on Earth Explorer.

Project #7: Build a Map of Eviction in San Francisco using CartoDB.js, or, How to Make a Map You Believe In

Throughout this semester, through all of the small victories and the frustrations and the beautiful journey of self-directed learning, I've always had one key goal at the front of my mind – make a map, exclusively through code, that I believe in. This has become a more and more difficult challenge for me as I've expanded my cartographic skills and interests into different realms and more complex politics. I have come to expect more of myself, both in terms of the design and the aesthetics of my maps as well as the epistemologies and methodologies that underlie them. I don't know if I would necessarily go so far as to call all of my maps "lies" (as Monmonier might), but, in practicing self-reflexivity in my cartographic design process, I have learned that they are certainly far from being objective "truth". Maps reflect their creators' perspectives and views of the world; not only that, they channel particular ways of viewing, seeing, engaging with, and understanding the world that are dependent on the tools used to make them. That will always be the map's limitation. But rather than seeing maps only for their limitations, for their role as imperfect representations of a world that is inherently messy and complex, perhaps we can think of them differently – as individual experiences and expressions that can have meaning for others beyond the cartographer. Pieces of art that can inspire. Beautiful data that can inform. Little worlds unto themselves.

A screenshot of my interactive map of eviction in San Francisco. The map comprises seven different data layers, each offering a unique view of the geography of eviction in the city.

This philosophy influenced how I designed this map of eviction in San Francisco. When I was out in the Bay for AAG, I found myself often interested in exploring the geographies of the city even more than those I addressed at the conference. Having the chance to walk through the Mission, the Castro, the Sunset, the Richmond, to look and observe and understand so little (but feel so much) about those places, to get a momentary picture of a city undergoing fundamental and violent change – it forced me to think deeply and critically about my own experiences living in cities, and my own participation in gentrification and forced displacement. My own complicity. For this reason, I wanted to try to reckon with this part of my history and my identity through the crafts that I love – cartography and code.

Fortunately, the city of San Francisco has a pretty good resource for understanding demographic and social change within its boundaries – SF OpenData. From here, I was able to find a dataset listing every eviction notice filed between January 1, 1997 and December 31, 2015 in the city of San Francisco. Now, it is important to note here that 1) an eviction notice does not necessarily mean that an eviction occurred, and 2) that not all evictions that occur have legal eviction notices filed for them. Hence, the data visualized in this map is an incomplete picture, one that both adds and takes away; however, it does at least give us a strong starting point for understanding how eviction has moved and shifted across the city over the past 19 years. I also used SF OpenData to grab the neighborhood polygons as a KML file. The data on median house value and demographics comes from the American Community Surveys (2005-2009) and the 2010 Census, reaggregated by the San Francisco Planning Department to fit the boundaries of the neighborhoods (we could've used our nifty AWR tool to do that as well).

So that's a fair bit of data to be dealing with and trying to render on the fly on an interactive map. Due to this requirement on the project, I felt strongly that I needed to use a service that would allow me to create and host a spatial database without having to build and host one from scratch on my own machine. So, instead of going through Leaflet and using GeoJSON data (like Joe and I did for the Zuni map), I decided to use CartoDB. CartoDB offers users a PostGIS backend for creating spatial databases, all of which can be altered and queried using PostgreSQL commands. In this sense, CartoDB essentially acts as a database in the cloud, communicating with an interactive map somewhere else on the internet through use of its APIs. This makes life quite a bit easier, especially for those of us with elementary database design skills.

Now, there are two ways you can build a web map using CartoDB. The quick and dirty way is to go through the Editor, which essentially provides a lot of graphical user interfaces (GUIs) for you to use to create your map. It's pretty and easy-to-use, but it is also rife with defaults that are hard to shake. The harder way, but the one that allows for much more customization, is to use CartoDB's javascript library, CartoDB.js. This library is awesome because it has a lot of built-in functions to help you do the heavy lifting (sort of like tools in ArcGIS), but it allows you to play with these functions or create new ones as you need to. The difficulty, of course, is that not everything that you may want to do is included in the documentation – often, you have to spend hours combing through Stack Exchange to find good hacks for playing with the library. Other times you make wild guesses using some jQuery that somehow work (that happened at least twice for me). It's all a process of trial and error, and thankfully one that gives instantaneous feedback and has a lot of wiggle room. If you're interested in finding good hacks for CartoDB.js, I would recommend starting with this blog post by Daniel McGlone at Azavea, my soon to be mentor for the summer. This map is coded using CartoDB.js.

Click here to access the HTML used to make this map. A lot of things are happening in this map, so I won't go over absolutely everything. Instead, I'm going to focus on the parts that I think are cool or interesting and that may be helpful for future developers. So here we go.

You'll see early on in the code that we have a bunch of <style> tags, many of which have bizarre-looking CSS inside them for different visualization types. These are actually written in CartoCSS, a language developed by Mapbox specifically for styling maps on the web. The reason these are included here is that we have a function later in our code, called createSelector, that does some pretty nifty stuff. First, there are event listeners tied to the various buttons on the map (i.e. All Ellis Act Eviction Notices, the Timelapse Map, the Heat Map, etc.), such that when one of these buttons is clicked, the createSelector function is run on that visualization. The function itself actually reaches into the HTML of the selected visualization, finds the CartoCSS attached to that (all that stuff stuck between the <style> tags), applies that CartoCSS to the data, and then adds it to the map. It's a complicated and multi-tiered process, but one that is really useful for applying many different styles to data with a relatively small amount of code. To learn more about how to do this for yourself, check out this tutorial from CartoDB.

Another cool element of this map is the custom HTML I wrote for the infowindows and the legends. The infowindows appear when you click on a neighborhood within the "Eviction Notices by Neighborhood" layer. The HTML I wrote for these little guys adds a nice black background to the neighborhood name, and then gives some information about the total number of eviction notices in the neighborhood since 1997, the median home value in 2010, and the racial makeup of the neighborhood in 2010. These are written using mustache templates, which have the benefit of allowing you to pull values from your PostGIS table into the HTML directly. Sweet! You'll also notice that two layers – the neighborhood layer and the heat map layer – have custom legends associated with them. I wired these legends to click events such that they only appear when the layer they describe is active – otherwise, they are turned off. Building custom legends is surprisingly easy; Daniel does a great job explaining how to do it in the blog post linked above. Again, it mostly involves writing some special HTML and CSS and getting your code to reach in there when a certain button is clicked. The powerful part of this functionality is that it really allows you to control the design of your map in a whole new way, such that you're not stuck with the limited options CartoDB gives you by default.

Perhaps the most flashy bit of functionality that this map brings is the timelapse visualization of eviction notices. This layer allows you to either 1) play a ~4-minute animation showing the geographic distribution of eviction notices filed, month by month, from January 1997 to December 2015 (1 month / second), or 2) slide to any month within that time period to see what the distribution looks like 'frozen in time'. Making this visualization required the use of another one of CartoDB's Javascript libraries – Torque.js. Torque.js was developed for visualizing time series data and it's one of the coolest ways I've seen to tell cartographic stories that shift through time. It's almost like being able to fit 216 small multiples into one static layout, while at the same time not completely overwhelming the map reader. Setting up a torque layer is actually relatively simple in CartoDB.js. It mostly involves creating a special Javascript object that calls the timestamp column in your PostGIS table, and then setting some CSS to control how this data changes styles as the animation runs. The most important choice for me was thinking about a color rule to use to show differences between locations with only one eviction versus those with many evictions. For example, let's say that only 1 eviction notice is registered in a building in the city on any given day, while 8 eviction notices are registered in a separate building. With no color rule applied these locations will look the same – only one dot, representing one eviction notice, will be visible. Using a color rules helps to bring out these differences by changing how overlapping dots interact. For this torque visualization, I used a color rule called Lighter, which creates a brighter shade of blue for locations where eviction notices overlap. On a dark background like mine (designed by the highly-talented folks at Mapbox), this effectively draws the map reader's attention towards these areas on the map, allowing them to pick out areas of particularly aggressive displacement.

Ultimately, I feel satisfied with how this map turned out. As someone who is obsessed with details, with color, type, flow, kerning, tracking, visual hierarchy, and data legibility, I think I will always find something to critique about this map. But the time has come for me to move on from it as my last days at Middlebury come to a close. I don't doubt that I will continue to tweak it somewhere down the line, to add or take away or rework, but in this moment I feel a sense of peace with it. It feels like a map I can start to believe in, even if it doesn't feel completely done. And perhaps, at the end of all of this, it's as much of a symbol for me as an object in the world for others to learn from. A symbol that I can teach myself some pretty complicated things and come out with something on the other side. A symbol that I can use my skills to work for social justice, for marginalized voices, for stories that are not told. A symbol that my positionality will always place me in difficult political circumstances around the maps I make, in positions of power where I ought not to have it. Because, really, who am I to make these maps? What stake or experience do I have in these communities? What right do I have to tell these stories? These are questions I have not yet answered, and they are questions that require and demand answering. But I think these questions are also the starting places for critical cartographers. We start by engaging with those very things that challenge and frighten and discomfort us most. We start by attempting to do good while also mucking things up, and we learn from that. We start by using what we know to help others make sense of the world, messy and complicated and contradictory as it is. We start by making maps that we believe in.

Special thanks again to Professor Joseph Holler, Middlebury College Department of Geography, for his unwavering support of this senior research. Thanks are in order as well for Professors Kacy McKinney and Jeff Howarth, the two people who inspired my love for geography and cartography nearly four years ago. And, of course, to my parents, Chris and John, for believing in this kid and all of his wacky dreams, and for always showing me the way.

Thank you for reading. Stay posted for future reflections on my upcoming adventures as a Summer of Maps fellow at Azavea!

Project #6: Help Joe Build an Interactive Map using Leaflet for the Zuni Youth Enrichment Project (ZYEP), or, How to Merge Cartography and Community-Based Work

Anyone familiar with the contemporary cartographic landscape knows that the climate is shifting swiftly away from the static and towards the interactive. People want maps they can play with, maps that are responsive to user needs, and maps that can be accessed with nothing more than an internet connection and a few directed search terms. Particularly for those seeking to create a community resource (or just something that folks can access on the go) a web map is a good design choice. Yet I personally still find something uniquely beautiful and important about static, print maps. Perhaps it is Jeff Howarth's voice in the back of my head, reciting a treatise on the pleasure of dwelling with a well-designed map. Perhaps it is the frustration I get from a little extra padding hidden somewhere deep in my CSS, mucking up my legends and selectors. Even so, it is hard to ignore the simple reality that web maps are 1) just plain cool, and 2) a powerful way for democratizing, distributing, and telling stories through geographic data. Their beauty lies in their ability to create multiple vantage points of the same data nearly instantaneously, and to bring these vantage points to the same map in a harmonious way.

For my first foray into interactive / web cartography, I picked up on a project my advisor, Joe Holler, had started to work on for the Pueblo of Zuni, NM and the Zuni Youth Enrichment Project. Having spent J-term out there collecting data and working with community members to discuss their design needs, Joe graciously decided to include me in the project in April as an assistant developer. The only problem, of course, was that I had only basic HTML and Javascript skills at the time, and literally no familiarity with the library we'd be using to create the map – Leaflet.

A screenshot of the Leaflet map Professor Holler and I created for the Zuni Youth Enrichment Project (ZYEP). To see this map in action, head over to my portfolio.

But as I've discovered throughout this senior research, the process of self-learning doesn't hinge so much on your prior familiarity or formal instruction with the tools. It really hinges more on 1) your ability to find good, reliable help at your level, 2) the availability of a worked example to base your work on, and 3) the stubbornness to persist in the face of setbacks. Fortunately, I had all three to help me with this project. Because Leaflet is an open-source library maintained and improved by a global user community, there is a strong support community for it online (particularly on Stack Exchange). People have figured out and shared all kinds of nifty hacks to make challenging things happen; moreover, their explanations of their code are surprisingly coherent for a programming language that can so easily behave in unpredictable ways. Joe was also kind enough to provide a worked example of a Leaflet map he had made so that I could parse through its different elements and understand how they work. While things like pulling map tiles from a tile server or rendering vector features from a GeoJSON object are not particularly difficult to pick up, other bits of interactivity (turning layers on and off, changing layer symbology based on multiple variables, etc.) can be. Lastly, I drew on a healthy dose of stubbornness to make it through this project. Each fix often introduces its own problems, and being able to maintain a positive trajectory really comes down to not getting discouraged. This also entails celebrating every little victory you can get.

Click here to access the HTML file used to create this map. Note that this HTML file is just the raw HTML – the map will not instantiate without the associated Javascript files. In that sense, it's more useful for getting a sense of how the map works rather than actually using it. To play with the actual map, head to this Github page I set up for the project.

So let's break down what's happening in the code here. First, the map has two base layers that users can toggle between, the topo map coming from the USGS and the satellite map from Mapbox. Including both these maps helps the user to acquire different information on the landscape, from topography and geomorphology to color. These two base maps are controlled through an array that is cycled through when users make a selection from the layer control in the top right. On top of the base maps are 5 different GeoJSON vector layers, one for trails, playgrounds, landmarks, distance markers, and bike paths, respectively. These files are basically not much more than an array of coordinates, read by Leaflet's L.geoJson function and placed on the map in the correct location. These are also controlled through a larger array (meta-array, perhaps?) passed to the layer control. The interesting thing here is that two of these layers are tied to the legend, such that the legend actually changes as layers are added and removed. This is done by connecting a legend update function to an array of HTML items, and then using a for loop that fills / removes the inner HTML of the legend with its corresponding array entry when those layers are turned on and off. Another cool feature of this map is the use of event listeners to affect how the map behaves when the user hovers and clicks on objects. This is fairly standard practice in web maps, but one that helps to signal to the user what feature(s) they're interacting with in any given moment. In our case, hovering over the trails and distance markers changes their color to black (to show they are actively highlighted); hovering on any feature on the map also pops up a label with the feature's name. This is a powerful bit of interactivity, but also one that requires some careful thinking – for example, what happens when a user moves their mouse to another feature? This is where Javascript can get tricky, because we also need to wire functions for removing styles when the user ends their interaction with a given feature.

One of the biggest challenges I originally faced with this map was pulling in Mapbox's Maki markers. To do this, I relied on a Leaflet plug-in developed by jseppi on Github. To get this bit of functionality required drawing on Leaflet's L.icon function and setting some custom CSS to style them with appropriate colors and sizes. Finally – and this is probably the coolest thing happening on the map – the actual styling applied to the trails layer changes when the basemap changes. I have to give credit here to Professor Holler for getting this to work. When we looked at how the trails layer looked against the USGS topo map, we found that a few of the trails were getting lost in the background due to hue and brightness similarity. To deal with this, we wired up a function that does a few things – 1) checks if the USGS basemap is active, 2) checks if the trails layer is on, and 3) if these two conditions are met, places a semi-transparent reddish-brown line beneath the trails. This effect gives just enough contrast to bring the trails layers to the top of the visual hierarchy.

Ultimately, I'm coming away from this project with a deep sense of accomplishment. To be able to learn a new library from scratch, to write code in three different languages (HTML, CSS, and Javascript) for the same map, and to actually have it work feels remarkable. But more importantly, this project has felt significant from the beginning because it will exist as an object in the world that people will actually use. It will have some utility for the community it serves, helping folks to locate and use services that can help them to live healthier lives. When I'm sitting hunched over my computer here in the GIS lab at Middlebury, it is this thought that excites me most, that makes all of this effort and difficulty and stress worth it. I think every cartographer dreams of having their work mean something beyond a temporary glance or a passing comment, beyond, "Oh hey, look, a map!" I think this project, like those I worked on for my cartographic design seminar, has that added dimension of meaning, of significance. It is, I hope, a map that matters.

Special thanks again to Professor Joseph Holler for his ingenuity and assistance on the development of this map, and for spearheading this project with ZYEP.

 

Project #5: Presenting at the Middlebury Spring Student Symposium (or, How to Demonstrate Your Lunacy to Your Friends)

After the high of AAG, I came back to Middlebury reinvigorated and excited to get back to work with coding. Learning Python scripting in a matter of five weeks, producing a hard coded script to solve a real-world problem, creating an abstracted geoprocessing tool for broader application, and writing my first piece of documentation all felt like significant achievements in such a short amount of time. I wanted to keep the rhythm going, to dive further into PostgreSQL and PostGIS, Javascript, interactive cartography, spatial databases, and the whole slew of crazy plans I dreamt up for myself for the rest of my semester. But before I did that, I felt it was important to share the experiences of this first half with my peers.

Let me preface this section by saying that I have tremendous respect for the academic community that I am a part of at Middlebury. There is so much intellectual curiosity and passion at this institution, and Symposium day really brings that culture to light. Disciplinary boundaries break down, and suddenly you find yourself in front of a room full of people, with your friend, the English major, asking you a question about how you might design a course to engage feminist theory and creative coding. And then you turn around, just an hour later, to go hear your friends speak about Wallace Stevens, public space in Havana, Afro-Latinx identity, and just about any other topic you might dream up. At the end of Symposium day, I always find myself looking at my peers in a new light – impressed, inspired, and grateful that I've found people who care this much about what they do.

Click here to view the slides for my presentation at the 2016 Middlebury College Student Symposium.

Preparing for the Symposium, I thought carefully about how I would frame my ideas for an audience with such diverse intellectual backgrounds. Some folks in the crowd would be GIS educators and professors themselves, for whom this stuff might seem trivial. Others might be completely oblivious to what GIS is. Dealing with this variation is something all academics and scholars have to deal with – this would be my opportunity to do so for the first time (well, technically second – I presented two years ago as well on some research I had done on the Shahbag Riots in Bangladesh with a professor at Amherst College...but that's another story).

Knowing this, I pitched my talk below the jargon-heavy level of AAG yet still above how I might explain my ideas to a friend over coffee. In this middle ground, I thought I might just be able to meet everyone where they were at. I started the presentation with my Acknowledgements section, which I think is a (sadly) rather unconventional model. However, drawing inspiration from Mia Mingus's keynote address at the 2011 Femmes of Color Symposium in Oakland, CA, I thought it was imperative to do this first. So many people have helped me on this journey although, ostensibly, it is an independent one. Professor Joseph Holler has patiently worked through long sections of code with me, always posing new challenges and pushing me to experiment. I relied heavily in the first half of my research on data and workflows developed by Professor Jeff Howarth and other members of the Middlebury Geography Department (including Bill Hegman, Mike Ryba, Bradley Gardener, and Patrick Culbert). And I have been inspired throughout my studies at Middlebury by Professor Kacy McKinney, who instilled in me the critical feminist politics that lie at the heart of my research. This level of support is something very few people have the fortune of receiving in life – it was only right that I acknowledge this to begin my presentation.

My second section really focused in on explaining what GIS is and how code fits into it. Here I went into more depth about different programming languages, talking about the utility of Python, SQL, and Javascript for doing all sorts of fascinating things (i.e. automated geoprocessing, custom tool design, database architecture and querying, interactive and web-based cartography, etc.). This helped to give some of the more abstract ideas later in my presentation a concrete basis. From here, I moved on to actually introducing my pedagogical framework, which involves seven separate steps:

  1. Posing a problem that geospatial technologies can illuminate or help us to understand. I am particularly attracted to problems of the social variety.
  2. Thinking about the problem first through a software-independent lens. This emphasizes considering just the actual spatial relationships we want to model.
  3. Reframing the problem through code – that is, how do we model relationships that are spatial in nature through syntax and language?
  4. Troubleshooting the inevitable errors that emerge. Because let's be honest – programming is more debugging than anything else.
  5. Abstracting the problem to imagine a scalable solution. The key here is thinking about how we can write code that is intelligent enough to respond to different data sets, different geographic scales, and different user needs.
  6. Testing our solution through different problems and broader application. Here we are obliged to ask that scary question – does our code actually hold up?
  7. Visualizing our solution and sharing the tools we used to make it. Data that tells a beautiful (or, more often, sobering) story needs good design. Moreover, if we want others to help us tell these stories, to change minds, to influence policy, and to critically engage with these same issues, then we need to document and share our tools.

I connected this framework to the process I had used during the first half of the semester while working with drug-related arrest data for the City of Chicago. And as I reflected back on my own work, on the difficulty of actually implementing this framework for my own learning, I came to a few important conclusions (all of which went into my Implications section at the end of my presentation).

The first idea comes from a geographer by the name of Sergio Rey, who brings up the idea of "seeing the code as text" (199) in his 2009 piece, "Show me the code: spatial analysis and open source." What Rey references is the idea that being able to actually examine and work with source code provides a learning experience that closely mirrors that of close reading in the humanities. I wanted to extend this idea, however, to argue that engaging with code is an even more powerful element of self-reflexivity in mapmaking, something feminist geographers like Sarah Elwood and Mei-Po Kwan write about extensively. In building something from scratch, you are forced to reckon directly with the assumptions that your creation makes, its limitations and shortcomings, and how all of these conspire to skew the story you're telling one way or another. You also have to think about how your own positionality, your worldview, affect this process from the beginning. The question then becomes: are the politics of your epistemological choices justified? How can you prevent your tools from waging epistemological violence on those you are "mapping" (as though the lived experiences of 250 people can be reduced to a point)? This point really led me to think about  an idea that Mei-Po Kwan brings up in a 2002 piece on feminist visualization – that of creating "different kinds of interactions between GIS users and GIS technology." And I think this is at least partially what she means – that bringing feminist theory into traditional GIS practice should result in a new praxis, one that values multiplicity and messiness above reductive, masculinist practices and narrow, Cartesian understandings of space. Pretty heady, I know. But importantly, I think code is still a relatively unexplored avenue for doing these very things.

Another key realization of my work has been that our research should drive the development of our analytical tools rather than the available tools governing our research. When we limit ourselves to working within the confines of proprietary softwares and established methodologies, we only tell certain stories and only in very particular formats. We limit the ways we can know, understand, and represent the world. Code, I think, is our pathway out of this dilemma because it puts the creative impetus back into the hands of the researcher herself. Finally, I concluded my presentation with a short diatribe on how I think code needs to be applied more directly to social justice initiatives. Within the realm of geography and GIS the possibilities are endless – examining the spatial variation and progression of eviction in gentrifying cities, understanding the complex migration patterns of global refugees, examining the ties between proximity to environmental pollutants and communities of color. While we often think of code as having a proprietary, profit-driven, capitalist economy behind it, can we imagine a paradigm in which it is actually used to subvert this system? That is something I think this work has revealed to me. That paradigm is something I firmly believe is possible. But first, it needs to be the paradigm that we are teaching our students.

Project #4: Presenting at the American Association of Geographers (AAG) Annual Conference (or, How to Geolocate Fellow Geography Geeks)

The research process can so often feel like a solitary experience. Particularly working in the realm of technical geospatial applications and assorted programming languages, I often find myself deep down the rabbit hole, following links on endless Stack Exchange and Github pages to their very ends. And while I enjoy this process – the careful search for a technique or a block of code or an idea that will help me solve my problems – I have also noticed how it has brought me further from the original purpose of my research. From the beginning, I have wanted my work to serve as a pedagogical tool, a framework for how we might go about teaching GIS and cartography with a keener eye to the role of both code and critical feminist theory in this learning process. And in order to do that, I needed to take an important step during this, the middle of my semester – presenting my work.

And what could be a better platform than the American Association of Geographers (AAG) annual meeting? For those of you unfamiliar with the AAG, take a moment to picture over 9,000 geographers descending upon a single city for five days of lectures, panel discussions, poster presentations, demonstrations, and utter geographic bliss. AAG is one of the largest conferences for geographers in the world, and offers the unique opportunity to engage with the diversity of work being done in the scholarly geographic community. For an undergrad like myself, it's also an opportunity to get to know my chosen discipline outside of Middlebury's walls (including all of the politics, arguments, and challenges that make it rich). I was thrilled to have the support of my research advisor, Joseph Holler, the Middlebury College Department of Geography, and the Undergraduate Research Office to present my work.

Click here to access a PDF of the poster I presented at AAG. The PDF is intended for a 4' x 3' poster, so the text may appear small on your monitor – zoom in to get to the good stuff!

Owing to the content of my poster, I was placed in the GIS & Technology Poster Session on Day 2 of the conference. This gave me a four-hour block to field questions, ask questions, meet fellow researchers interested in GIS, and contribute my few, small ideas to the broader geographic community. To my delight (and surprise), there was a significant amount of interest in my poster. The vast majority of attendees were GIS educators, many of whom had been struggling themselves with this quandary of bringing code, theory, and GIS together in a harmonious way. A lot of educators seemed to agree with my argument that we need to move away from prescriptive, technique-driven pedagogical strategies (i.e. here's where the buttons are and here's how to push them in the right order) towards a more applied approach that values multiple solutions, creative design,  critical methodologies, and the integration of GIS work with broader social justice work. But they also pushed back, contending that this learning style feels to many of them like a still unattainable dream.

   Presenting at AAG 2016 in beautiful San Francisco, CA as part of the GIS &amp; Technology Poster Session.

 

Presenting at AAG 2016 in beautiful San Francisco, CA as part of the GIS & Technology Poster Session.

In particular, a lot of educators relayed that their students tend to have strong preconceived notions about their abilities to code, all of which play into how they do in the classroom. While some students love the creative problem-solving inherent in programming (and thrive doing it), others shut down at the thought of writing even basic commands. It is this dichotomy that makes bringing code into the GIS lab such a difficult and messy experiment, particularly when its introduction often creates and reproduces broader structures of dominance in the academic world (i.e. quantitative methods > qualitative methods, positivism > multiple truths, reductive theoretical approaches > descriptive theoretical approaches, solutions > critique).

I also spoke with other educators about the role of code as a social notion connected to specific gendered, racialized, and class binaries. When many of us think of code, we may initially jump to two conclusions: 1) that those writing code are (primarily) young, white, privileged men, and 2) that code is about capital, Silicon Valley, Google buses in the Mission – in short, proprietary, closed-source development. Depending on how you identify, this social understanding of code can (obviously) feel exclusionary. The private sector certainly hasn't done much to combat this stereotype – in 2014, 61% of Google's workforce identified as white, and 70% identified as male. Only 2% identified as black, 3% as Hispanic, and 30% as female (PBS Newshour). Part of my research, then, has also been to reflect on how introducing code into the way we teach GIS will necessarily involve an engagement with these disparities, with the racialized, gendered, and classed dimensions of code.

I also think changing this paradigm will entail changing the contexts for, and the spaces of, coding. Towards this end, I think GIS is a fascinating place to start. First, GIS (unlike, say, software development) is not an ends in and of itself, but rather a means of understanding data through a spatial lens. Its role is to influence the way people think, act, and understand their world. In this way, GIS (and by extension, code) needs theory to obtain meaning; in the same way, theory needs GIS (and code) to obtain application. Second, I think code and GIS have tremendous potential for social justice work. A few groups are getting on to this now (i.e. Azavea's Summer of Maps program, UChicago's Data Science for Social Good, Code for America), but it seems to me (and to the educators I spoke with) that there is still an unexplored relationship between code and non-profit work. Finally, GIS is moving toward the direction of open. And what I mean here is that the development community in the geospatial world is latching onto the idea of sharing not only their tools, but their manuals, their expertise, and their experience with anyone possessing an internet connection and a curiosity. This type of open, inclusionary space (which still, of course, can be exclusionary in other ways) is getting towards a new paradigm of code, one that encourages collaboration, improvement, improvisation, and shared reward. And I feel that it is within this space, in the bridges between code, GIS, and social justice, that a new paradigm of coding might emerge – a new praxis that values multiple solutions, creative design, social impact, and reflexive practice. This is the type of space that I dream of coding in one day.

Ultimately, I couldn't have been happier with my experience at AAG. Being around geographers for 9 hours a day confirmed that I have found my disciplinary home, that I really would rather be no place else. If you've gotten all the way through this piece and you're wondering 'Who the hell planted all of these crazy ideas in Parker's head?' check out work by a few of my favorites (who I saw speak at AAG), including Mei-Po Kwan, Sarah Elwood, Nadine Schuurmann, Agnieszka Leszczynski, and Matthew Wilson.

Project #3: Writing Documentation for the AWR Custom Tool (or, How to Make Your Design Process Intelligible)

Having created a custom Python geoprocessing tool to automate the process of areal-weighted reaggregation (AWR), my next task was to write documentation articulating how the tool actually works. Documentation is obviously a critical part of development – without it, the tools we create can't readily be accessed by others. These things we've poured our heart and soul into have no broader application in the world.

Having cruised around the open source GIS world these first six weeks of the semester, I can tell you that the documentation out there really varies. Some folks (i.e. Leaflet, Mapbox, and CartoDB) spend considerable time making their documentation readable, user-friendly, and accessible. Other developers, particularly those who build more specialized tools and work in the deep end of computing, write documentation that assumes a considerable amount of prior knowledge. And of course, there are other developers who include just absolutely inscrutable stuff. The point here is that, just like any writing, documentation is about knowing your audience. This will influence how you structure your docs, the lexicon you use, and what background expertise you will assume.

Click here to access the documentation for my areal-weighted reaggregation (AWR) tool.

When I was thinking about my own audience for the AWR documentation, I was conscious of the fact that I developed the tool in the Esri ArcGIS framework. With this in mind, I felt it was ok to assume a basic knowledge of ArcGIS or, at the very least, some GIS framework in which there exist tools, inputs, and outputs. I did, however, want to assume less knowledge when it came to Python scripting, creating new toolboxes, and pulling an external script for use in your own project. These are more complicated procedures that require time to learn – having been on the other side of this line just a few weeks ago, I know how intimidating it can be to approach documentation that requires a computer science degree to comb through. Therefore, the documentation walks users through the process of acquiring, setting up, and running the script in their own projects. Another important aspect of my documentation involves providing an explanation of what areal-weighted reaggregation (AWR) actually is. The technique is confusing conceptually, especially for beginning GIS users, and having a clear, concise explanation of what the script is doing with your data is, I think, crucial. Finally, I wanted to include a Limitations section to my docs, per the suggestion of my advisor Joseph Holler. Acknowledging where your tools fail, where they make questionable assumptions, and where they could be improved is critical to responsible knowledge-making.

Ultimately, I hope this documentation helps folks use and improve upon this tool going forward. I tried to keep the documentation concise and readable while still giving enough content to comprehensively explain what I've created. If you have any thoughts on how I could improve this documentation, or the tool itself, please don't hesitate to get in touch on the Contact page.

Project #2: Creating Custom Geoprocessing Tools with Python Scripts (or, How to Bring Your Wildest Dreams to Life)

While I think many GIS analysts cherish Python for its ability to automate large-scale geoprocessing tasks and do a lot of the heavy lifting when it comes to data management, I find that its true creative power lies in creating custom geoprocessing scripts. So just to back up for a moment, we know that all GIS applications are based on the idea of passing spatial files (i.e. shapefiles, rasters, feature classes) through script tools that do something to these files. These scripts, nowadays, are often written in Python, especially after Esri began writing all of their tools in this language. For most GIS users, however, script tools operate almost like a black box – files are passed into them and new files pop out. While we all have an understanding of what the input needs to look like and what the output should look like, what happens in the middle actually remains fairly opaque. This obscurity was something that initially frustrated me when I first began using GIS. I wanted to know more about what was actually going on "under the hood" and to be able to manipulate that process to make different things happen.

So for this second project, I posed a new challenge to myself – to create a custom geoprocessing tool and, in the process, learn more about how geoprocessing tools actually function. My professor suggested that I try making a tool that would tackle a somewhat tricky (though regularly used) technique in GIS, something known as areal-weighted reaggregation (AWR). The goal of AWR is to get data that is aggregated by one areal unit and reaggregate this data by a different areal unit, particularly for the case when the areal units are not nested. The script I wrote for Project #1 uses areal-weighted reaggregation, taking Census data that gives the number of people of each race in each block group and reaggregating it by the Chicago Police Department's beats (patrol areas). This is ultimately what allows us to relate the density of arrests for possession of certain drugs in each beat to the racial makeup of that same beat.

So why is creating a tool to do this process actually useful? Well, areal-weighted reaggregation is a surprisingly common technique that needs to be performed regularly when working with spatial data. All too often, we have data aggregated by one areal unit, but we need it aggregated by a different areal unit. By creating a custom tool to do this process, we can use it on any problem that has the required inputs. For example, take the Chicago map I made in Project #1. With a custom AWR tool, we could make the same type of map for every city in the United States, or for every major city in the world (assuming reliable and extant data – which you really can never assume). Or let's say you're interested in completely different statistics (age, gender, socioeconomic class, education, etc.) or completely different scales of analysis (i.e. counties, states, countries, continents). Again, this tool can do that type of analysis as well. The power of creating custom tools is in their diversity of application, their ability to be applied to many different contexts and problems. The script from Project #1 is limited in the sense that it only works for one particular case (singularity of application).

Alright, now for the goods. Click here to access my script for the Areal-Weighted Reaggregation tool. So, a little explanation of the tool itself. The tool has a fair amount of parameters that need to be entered (because, well, it does a fair amount of things). These include:

Workspace – where are you storing your files that you're working with? The script needs to know this in order to know where to look on your computer to get at the files of interest.

Data Feature Class – the feature class holding your data of interest. This is the file that contains the aggregated data that you will be reaggregating. For example, if you have data in block groups and you want to get it by police beats, your block groups feature class would be the input for this parameter.

Areal Unit Feature Class – the feature class holding your areal unit of interest. This is the file that contains the areal units by which you are reaggregating. For example, if you have data in block groups and you want to get it by police beats, your police beats feature class would be the input for this parameter.

Dissolve Field – the field in your areal unit feature class that uniquely identifies each areal unit in the feature class. Your data will be reaggregated (dissolved) based on this field. For example, in your police beats feature class, there is likely a field (i.e. BEAT_NUM) that uniquely identifies each police beat. This would be the input for this parameter.

Intersect Feature Class – the intersect feature class created as part of the areal-weighted reaggregation. This is one of two outputs created by the tool and includes the result of an intersect analysis between your data feature class and areal unit feature class. This feature class isn't necessarily imperative for analysis, but it does have all sorts of applications within and beyond running an AWR. For this parameter, you provide a name and the script will write the intersect feature class with that name.

Statistic Field(s) – the field(s) that you actually want to aggregate (SUM) in the output feature class. These fields are the data of interest contained in your data feature class. For example, if I wanted to aggregate the number of Black, White, Hispanic, and Asian people by police beat, I would select fields "BLACK", "WHITE", "HISPANIC", and "ASIAN" for this parameter. Note that this parameter can be any numerical input (it would likely not work for strings, although I can't say I've tested that).

Output Feature Class – the output of our areal-weighted reaggregation. The output feature class of our AWR will provide the statistics specified in the parameter above reaggregated by the areal units specified in the Dissolve Field parameter. These statistics will appear with a prefix "SUM_" and a suffix "_awr". For example, performing an AWR on the Chicago data outputs the following fields in the output feature class: "SUM_WHITE_awr", "SUM_BLACK_awr", "SUM_HISPANIC_awr", "SUM_ASIAN_awr", "SUM_MULT_RACE_awr."

To ensure that the tool actually worked on more than just the Chicago data, I ran it on a similar dataset for North Carolina and Indiana. In this problem, I used my custom AWR script to take data for the number of black and white people in each Census block group and reaggregate by house district. The results are pictured below.

Notice the fields in the attribute table "SUM_WHITE_awr", "SUM_BLACK_awr", etc. These are the outputs of the tool and are visualized on the map on screen. Notice as well that I also included the number of males and females as statistics in my AWR – you can imagine, however, that distribution of men and women might be more homogenous than race.

Ah, the sweet feeling of success when a custom tool runs correctly. Notice too that the tool is quite a behemoth – it took over 5 minutes to run on this data set. Be wary that as the number of features in your feature classes increases, so does the amount of time it will take for the script to properly run.

Here is the good ol' Chicago map once again. This time, however, I used my custom AWR tool to perform the analysis rather than the specific script from Project #1. Comparing the two attribute tables, we see that both the script from Project #1 and the AWR tool produce the same results. The GUI on the left is the AWR tool come to life, and the one on the right shows the Properties of the AWR tool.

Ultimately, it was a remarkably satisfying experience to write this tool and see it come to life. It was also a terrific lesson in learning how to abstract a specific problem into something that can be more universally applied. If you have further questions about how this tool works or how to get it to work in your own ArcMap project, feel free to get in touch on the Contact page. Check back in a few weeks for an update on my upcoming work with relational databases and my presentation at AAG! Below is the full code written out.

# Name: ArealWeightedReaggregation.py
# Description: Perform an areal-weighted reaggregation given two input polygon vectors. Statistics and dissolve fields must be specified in parameters.
# Author: Parker Ziegler

# Import system modules.

import arcpy
import os

# Set environments and output workspace.

from arcpy import env
env.overwriteOutput = True
env.workspace = arcpy.GetParameterAsText(0)

# Set local variables to the polygon feature class with the data of interest (feature data) and the polygon feature class with the new areal units (areal units). Specify the dissolve field in the areal units feature class that will be used to aggregate the data.

feature_data = arcpy.GetParameterAsText(1)
areal_unit = arcpy.GetParameterAsText(2)
dissolve_field = arcpy.GetParameterAsText(3)

# Add a field called "AREA_M" that specifies the area of each polygon in the polygon feature class with your data of interest (feature data). This will be used in a later step to calculate an area ratio.

arcpy.AddField_management(feature_data, "AREA_M", "DOUBLE", 10)
arcpy.CalculateField_management(feature_data, "AREA_M", '!shape.geodesicArea@squaremeters!', "PYTHON_9.3")

# Intersect the feature data polygons with the areal units polygons.

intersect_fcs = [feature_data, areal_unit]
out_fcs = arcpy.GetParameterAsText(4)
arcpy.Intersect_analysis(intersect_fcs, out_fcs)

# Now, create a field called "AREA_RATIO" that will divide the areas of the intersected polygons by the areas of the feature data polygons.

arcpy.AddField_management(out_fcs, "AREA_RATIO", "DOUBLE", 10)
arcpy.CalculateField_management(out_fcs, "AREA_RATIO", '!shape.geodesicArea@meters!/!AREA_M!', "PYTHON_9.3")

# Next, multiply this ratio by the statistic(s) of interest. This will calculate the value of the statistic(s) for each intersect polygon.

# First, create a field(s) that will hold each new statistic.

# Set local variables.
calc_fields = arcpy.GetParameterAsText(5)
fieldList = calc_fields.split(";")
fieldList.sort()
field_type = "DOUBLE"
field_length = 10

for field in fieldList:
arcpy.AddField_management(out_fcs, field + '_awr', field_type, field_length)

# Now, calculate the values for all five fields using the ratio created above and populate the fields created from the previous step.

awr_fields = []
awrList = arcpy.ListFields(out_fcs, "*_awr")
for field in awrList:
awr_fields.append(field.name)
awr_fields.sort()
i = 0
for field in awr_fields:
arcpy.CalculateField_management(out_fcs, awr_fields[i], '' + '!' + fieldList[i] + '!' + '*!AREA_RATIO!', "PYTHON_9.3")
i += 1

# The last step in the AWR is to dissolve our intersected features along the boundaries of the new areal units. This will reaggregate our data by these new areal units.

# Define variables.

in_features = arcpy.GetParameterAsText(4)
out_features = arcpy.GetParameterAsText(6)
field_names = []
stat_fields = arcpy.ListFields(in_features, "*_awr")
stat_fields.sort()
for field in stat_fields:
field_names.append(field.name)
statList = [str(field) + " SUM" for field in field_names]
dissolveStats = str("; ".join(statList))

# Run the Dissolve tool.

arcpy.Dissolve_management(in_features, out_features, dissolve_field, dissolveStats)

Project #1: Python Scripting for Automated Geoprocessing (or, How to Make Magic Happen)

When I first developed the syllabus for my senior research, I spent a lot of time asking fellow GIS students, professors, and analysts where they thought I should begin my study. And nearly everyone gave me the same answer – Python scripting.

Python scripting in GIS is primarily desirable because it allows you to automate almost all of the geoprocessing tasks you do in GIS without so much clicking. Rather than moving tediously from one GUI (Graphical User Interface, those pop up boxes where you tell a tool what files and parameters to use) to the next or attempting to build a flawless model from scratch, a Python script allows you to write code that is flexible, dynamic, and most importantly, adaptable to a variety of contexts. Let's say, for example, that you needed to project 1000 feature classes or run a particular technique on a group of shapefiles 500 times. Doing this manually would likely take days, if not weeks, of work. Conversely, a Python script allows you to direct the computer to a group of files and say, "Hey, could you bring these files into a GIS, run x, y, or z operations on them, and tell me what happens?" For me, Python scripts evoke a sensation of performing magic (particularly when layers you only conceived of creating actually start appearing on screen – ok, that's it for the meta stuff for now).

Another beautiful aspect of Python scripting is the ability to add a lot of creativity to your own geospatial analysis. Before I started scripting with Python, I didn't realize how much unseen functionality the folks at Esri and the QGIS Project had programmed into their geoprocessing tools, how many little details are taken care of for you. On the one hand, this is part of what makes GIS a lot more user friendly and approachable nowadays – you don't have to worry too much about the nitty gritty and you don't need to have a programming background to get started. However, I also think working exclusively with the built-in functionality of ArcGIS or QGIS confines you somewhat to very specific techniques or routines (which, ultimately, produces an inundation of very similar types of maps).

So, for my first project, I decided to tackle what felt like an ambitious project for a self-taught programmer like myself. Could I write a Python script that would make this map?

I made this map towards the end of last semester using data and a workflow courtesy of Professor Jeff Howarth in the Middlebury Geography Department. I did deviate in considerable ways from the original method for making this map, and dressed it up with my own cartography, but it was in many ways a standard GIS procedure for me. I used GUIs, Esri geodatabases, all of the built-in functionality I could get my hands on. The challenge now would be to see if I could do the same without so much support.

Fortunately, I did stumble upon a solid book to help me develop my scripting capabilities quickly, Python Scripting for ArcGIS by Paul A. Zandbergen. The book really focuses on using the ArcPy site package with ArcGIS to write effective, efficient code. It acts more as a conceptual skeleton than a How To guide, giving you the initial tools that you need to start exploring ArcPy and developing your own scripts. For this first project, I really wanted to stick with ArcPy because 1) it has a whole slew of solid methods and functions custom built for geoprocessing, 2) it works seamlessly with standard Python functions (i.e. you can nest 'regular' Python functions into larger ArcPy functions), and 3) there's well written, easy to use documentation available from Esri. At the same time, there is some dissonance between the stated goals of my research and the choice to use ArcPy. It isn't open source (it is built directly into ArcGIS, a very expensive piece of software), it has a good deal of custom functionality that still obscures what's going on "under the hood", and code produced using ArcPy can't be run by folks not using the Esri suite of products. And so I want to take a moment to acknowledge these benefits and drawbacks in my decision, and to recognize that this tension between convenience and accessibility is palpable in the GIS world today and demands further investigation. I will continue to do that investigation throughout this semester. Now, on to the results!

Click here to access the script I used to make this map. Note that this script is specific to the files and databases written to my personal drive at Middlebury. If you want to replicate this map, you will have to grab the data yourself (don't worry, I give you some sources where you can find it in the comments at the beginning of the script).

This script took a fair amount of troubleshooting to actually work through. ArcPy requires very specific syntax and does not like to perform operations on conflicting data types (trust me, I found out the hard way). This is true of most programming languages, but what is perhaps more difficult with ArcPy is that you have to do a fair amount of data type conversion and concatenation to actually get seemingly simple calculations to go through. It's also difficult to iterate through individual feature classes to do specific operations on each of them. You'll notice that I used multiple for loops with lists and indexes to take care of this problem; you could also use cursors, although I find these to be a bit more cumbersome.

Getting this script to work was a remarkably satisfying moment for me. The only items the script starts with are a .mxd file (ArcMap Document file), a shapefile of police beats for the city of Chicago, a shapefile of all Census block groups for the united States, and a .csv file (Comma Separated Value file) with XY coordinate data for arrests for possession of crack cocaine. This means that the script takes care of:

  • Setting Environments
  • Creating a file geodatabase
  • Performing a select by location to get block groups only for the city of Chicago
  • Making an XY Event Layer to display the crime data and writing this data to a new feature class
  • Converting all shapefiles to feature classes in a geodatabase
  • Projecting all feature classes
  • Performing a spatial join
  • Running an areal-weighted reaggregation to get the number of people of each race by police beat across the city (this is the entire second half of the script and encompasses about 8 separate operations...but more on this next post)

My hope is that in making this script available I can help others to write similar scripts that also take on the challenge of visualizing social injustice. While I am proud of the accomplishment, I am more excited about the story that this map tells - one of racially biased policing in a city that has seen too much violence. And so I am reminded that this thing I have made, this script, is only really as valuable as the story it helps to tell and the minds it helps to change. Below, you will find the map that this script produces (Note: you will have to do the symbology manually to get your map to look like mine).

And here is the code, fully written out, that made this map.

# NAME: AWR_Script
# Description: Run an areal-weighted reaggregation (AWR) on crime data and racial Census data for the city of Chicago given a .csv file of crimes, a shapefile for Census block groups, and a shapefile for CPD police beats.
# Author: Parker Ziegler

# Preparatory Steps
# Create a .mxd file to hold your map (i.e. "Chicago_Arrest_Map.mxd")
# Create a .csv file of the point data you're interested in. In this case, I used data showing the locations of arrests for crack cocaine possession in Chicago, courtesy of the Chicago Data Portal.
# You'll need to acquire shapefiles of the City of Chicago police beats as well as Census data. These can be accessed via the Chicago Data Portal and census.gov, respectively.
# Place all files and their metadata into your workspace folder.

# Import system modules.

import arcpy
import os

# Set environment workspace, enable file overwrite.

from arcpy import env
env.workspace = r"W:/GG700/Chicago_Arrests"
env.overwriteOutput = True
print "Env is set to Chicago_Arrests."

# Create a file geodatabase to store your data

out_folder_path = r"W:/GG700/Chicago_Arrests"
out_name = "Chicago_Arrest_Data.gdb"
arcpy.CreateFileGDB_management(out_folder_path, out_name)
print "Geodatabase created."

# Before starting analysis, we need to clean up our Census layer. In order to run the SelectLayerByLocation tool we first have to create a layer file from our Census shapefile.

arcpy.MakeFeatureLayer_management("block_groups.shp", "block_groups_lyr")
print "Census layer created."

# Perform a select by location on the Census data to get only the blocks groups that intersect the CPD police beats.

arcpy.SelectLayerByLocation_management("block_groups_lyr", "INTERSECT", "police_beats.shp")
print "Intersecting features selected."

# Write these block groups to a new shapefile.
arcpy.CopyFeatures_management("block_groups_lyr", "chicago_blk_grps.shp")
print "Features copied to shapefile."

# Display the Crack_XYData.csv as points and write a point feature class.
 
# Set variables.

in_table = "Crack_XYData.csv"
x_coords = "Longitude"
y_coords = "Latitude"
out_layer = "crack_as_points"
saved_layer = "crack_as_points.lyr"
sr = arcpy.SpatialReference("WGS 1984")

# Make the XY Event Layer

arcpy.MakeXYEventLayer_management(in_table, x_coords, y_coords, out_layer, sr)
print "XY Event Layer created."

# Create an output layer file (.lyr) file.

arcpy.SaveToLayerFile_management(out_layer, saved_layer)
print "Layer file created."

# Copy the layer file to a shapefile (.shp).

arcpy.CopyFeatures_management(saved_layer, "crack_as_points.shp")
print "Shapefile successfully created."

# Move your "crack_as_points.shp", your "police_beats.shp", and your "chicago_blk_grps.shp" into your file geodatabase.

in_features = ["crack_as_points.shp", "police_beats.shp", "chicago_blk_grps.shp"]
out_gdb = r"W:/GG700/Chicago_Arrests/Chicago_Arrest_Data.gdb"
arcpy.FeatureClassToGeodatabase_conversion(in_features, out_gdb)
print "Shapefiles have been converted to feature classes in the geodatabase."

# Now, get all your feature classes into the same spatial reference.

env.workspace = out_gdb
infcList = arcpy.ListFeatureClasses()
infcList.sort()
outfcList = ["chicago_blk_grps_project","crack_as_points_project", "police_beats_project"]
outCS = arcpy.SpatialReference("NAD 1983 UTM Zone 16N")
i = 0
for infc in infcList:
arcpy.Project_management(infcList[i], outfcList[i], outCS)
i += 1

print "All layers are in the same coordinate system."

# By now, your three features classes should be in the same coordinate system. Now, perform a spatial join to get the number of crimes in each police beat.

# Join "crack_as_points_project" (points) to "police_beats_project" (polygons).
# Define variables.

target_features = "police_beats_project"
join_features = "crack_as_points_project"
outfc = "police_beats_crack_joined"

# Run the Spatial Join tool.

arcpy.SpatialJoin_analysis(target_features, join_features, outfc)
print "Crack points have been joined to police beats."

# Performing a spatial join will automatically create a field labeled "Join_Count"; use this field, as well as area information from your police beats feature class, to create a new field that gives arrests / sq. km.

# Add fields for police beat geometry and crime density.

arcpy.AddField_management("police_beats_crack_joined", "area_beats", "FLOAT", 9)
arcpy.AddField_management("police_beats_crack_joined", "arrest_dns", "FLOAT", 9)
print "Two fields have been added."

# Calculate geometry, use this value to calculate a crime density for each police beat.

arcpy.CalculateField_management("police_beats_crack_joined", "area_beats", '!shape.geodesicArea@squarekilometers!', "PYTHON_9.3")
arcpy.CalculateField_management("police_beats_crack_joined", "arrest_dns", 'float(!Join_Count!)/float(!area_beats!)', "PYTHON_9.3")
print "Crime density field has been calculated."

# Write this joined value to a new feature class, as joins are stored only in memory.

arcpy.CopyFeatures_management("police_beats_crack_joined", "arrest_density")
print "Arrest Density feature class written to geodatabase."

# You have now successfully created the choropleth layer representing arrest density for possession of crack cocaine by CPD police beat. The next part of the workflow involves performing an areal weighted reaggregation on our Census data to obtain the number of people living in each police beat by race.

# Calculate the area of each Census block group.

arcpy.AddField_management("chicago_blk_grps_project", "a_blk_grps", "DOUBLE", 10)
arcpy.CalculateField_management("chicago_blk_grps_project", "a_blk_grps", '!shape.geodesicArea@squarekilometers!', "PYTHON_9.3")
print "Area of each block group calculated."

# Intersect the Census block groups with the police beats.

in_features = ["chicago_blk_grps_project", "police_beats_project"]
out_features = "blk_grps_beats_intersect"
arcpy.Intersect_analysis(in_features, out_features)
print "Intersect completed."

# Now, create a field that will divide the areas of the intersected features by the areas of the Census block groups.

arcpy.AddField_management(out_features, "area_ratio", "DOUBLE", 10)
arcpy.CalculateField_management(out_features, "area_ratio", '!shape.geodesicArea@squarekilometers!/!a_blk_grps!', "PYTHON_9.3")
print "Area ratio calculated."

# Next, multiply this ratio by the number of Black, White, Hispanic, Asian, and Multiple Race people in each Census block group. This will give us the number of people of each race in each intersect feature.

# Add five fields, one for each race.

fieldList = ["black_pop", "white_pop", "hisp_pop", "asian_pop", "mult_pop"]
field_type = "DOUBLE"
field_length = 10
for field in fieldList:
arcpy.AddField_management(out_features, field, field_type, field_length)
print "All fields for racial data added."

# Calculate the values for all five fields using the ratio created above and the Census data included from the intersect.

calcList = ["BLACK", "WHITE", "HISPANIC", "ASIAN", "MULT_RACE"]
i = 0
for field in fieldList:
arcpy.CalculateField_management(out_features, fieldList[i], '' + '!' + calcList[i] + '!' + '*!area_ratio!', "PYTHON_9.3")
i += 1
print "All fields for racial data calculated."

# The last step in the AWR is to dissolve our intersected features along the boundaries of the police beats. This will reaggregate our racial data by police beats, which we will represent using a dot density.

# Define variables.
 
in_features = "blk_grps_beats_intersect"
out_features = "race_by_beats"
dissolve_field = "BEAT_NUM"
stats_fields = "black_pop SUM; white_pop SUM; hisp_pop SUM; asian_pop SUM; mult_pop SUM"

# Run the Dissolve tool.

arcpy.Dissolve_management(in_features, out_features, dissolve_field, stats_fields)
print "Dissolve completed."

# The script is now complete. The final map should be composed of two layers – "arrest_density" (a choropleth density of arrests for possession of crack cocaine) and "race_by_beats" (a dot density representation of the racial make up of Chicago). You may also want to add a base map depending on your purposes.
# Use the symbology GUI to change the symbology and set boundaries as you wish. I recommend doing most of your design work in Adobe Illustrator.

Senior Research in Geography – GEOG 0700: Programming for GIS

"How can we make GIS do something different?"

Professor Sarah Elwood from the University of Washington posed this question to me one November afternoon towards the end of the fall semester, as we discussed her research on feminist GIS practice over Skype. At the time, I was working on a podcast for my senior seminar, Feminist Geography, that asked the question, "What does it mean to make maps in a feminist / critical way?" After a healthy literature review and two other interviews with experts in the field, I was still struggling to answer this question. I knew that feminist GIS had something to do with questioning how cartographic knowledge is made, with challenging the reductive technique of representing human experiences of space as points, lines, and polygons, and with breaking the boundary between cartographers and the subjects (objects?) of their maps. But it wasn't until I spoke with Professor Elwood that I received the insight I craved. The goal of all this was related to one simple idea: that we could be doing GIS differently.

Fast forward to the start of the spring semester, and here I am trying to figure out what, exactly, that might look like. What would it mean to do GIS differently? What tools would we use? Hadn't all of the tools been invented already? Where would I start?

Well, the answer to that last question was somewhat obvious: the internet. As I perused the methods sections of different papers by feminist and critical geographers, and spent hours trying to figure out how certain maps and visualizations were made, I began to realize that these cartographers were using more than your standard geoprocessing toolkit – they were using code. I had known for a long time that GIS people and computer scientists had a lot to talk to each other about, but I had never realized the full extent of the connection. Particularly when I discovered the open source world led by CartoDB, Mapbox, the QGIS Project, Development Seed, Leaflet, the PostGIS folks, and all of the other hundreds (perhaps thousands) of developers out there working on this same problem, my mind began to churn in excitement. I could see a new bridge forming, a connection whereby code, GIS, feminist epistemologies, and smart design could come together to make maps that truly did something different. My next task was to figure out how I could do that myself.

Here, in my last semester at Middlebury, I have decided to dedicate my capstone senior research to this exploration. With the excellent guidance of my faculty advisor, Professor Joseph Holler, I will be pursuing an extensive and ambitious self-designed syllabus to learn the coding techniques necessary to make new kinds of maps. This will, of course, involve a lot of making "old kinds" of maps, of experimenting with traditional techniques to see where I might be able to do things differently, to deviate from the master plan. The ultimate goal of this research is to produce a framework for how we might go about learning (and teaching) GIS differently, using a code-based approach that emphasizes creativity, novelty, universality, and smart design alongside a critical epistemology of self-reflexivity, positionality, accountability, and social justice. In this pursuit, I hope to cultivate both technical and philosophical skill as a young cartographer and developer. I am a self-taught programmer, and have become adept at coding largely through determination, commitment, and sheer stubbornness (and let's of course be sure to mention the amazing pooled community knowledge on Github and Stack Exchange). I am passionate about making maps that tell impactful stories and empower marginalized voices. And I want to dedicate my life to changing how we think about cartographic knowledge and the process of making it. I hope, in this senior research, I can begin to do just that.

To view a preliminary syllabus of the research, click here.