Posts

In the marketing world, big data is used to answer ostensibly minute questions every day: are computer mouse movements predictive of purchasing? Does an orange background increase user engagement? In every place with Silicon in its name, there are teams of data scientists asking these questions. In the social sector, by contrast, answering helpful questions is more difficult. For instance, is our program reducing homelessness? How is health spending distributed across the state?

CONTINUE READING

Edit, 3/28/18: RStudio just announced Python interoperability through the reticulate package. Rmd Notebooks are unbeatable, in my opinion. Original Post: I started using Jupyter Notebooks back when they were called IPython. I even remember having to set up a virtual Linux environment because they were not available on Windows. As much as I have enjoyed their functionality, I recently switched entirely to R Markdown in an RStudio environment. Here’s why.

CONTINUE READING

Kaggle is a forum for interacting with other data scientists and competing to see who can write code that will best predict features of data. It’s a way to test your skills at statistics and machine learning, and to do a lot of human learning in the process (sorry, bad pun). When I entered the contest to categorize crimes that occurred in San Francisco, my initial goal was to do better than random chance.

CONTINUE READING

R has been the perfect language for the back end of this government data dashboard I am developing. It has excellent packages to pipe in data from every significant source Tools like dplyr and tidyr make cleaning and munging data trivial It is ideal for automating analysis In the R script that powers my dashboard, I have everything from simple averages and frequency tables, to a complex algorithm that converts timeseries figures to Z-Scores and then selects the top 3 variables to display based on standard scores from the last 7 days.

CONTINUE READING

As I did last year, I went through several of my favorite sites and curated what I consider to be the best writing on urban issues from 2015. One thing I love about planning, and that drew me to the profession in the first place, is that it encompasses many skills and areas of interest. I think that diversity is reflected in this year’s list. Caveat emptor: I use the term planning loosely.

CONTINUE READING

Mayor Curtatone and I recently returned from the Smart Cities Expo in Barcelona Spain, where we unveiled a new partnership between the City of Somerville and the car manufacturer Audi. We will be testing how autonomous vehicles work in an actual urban environment. Driverless cars predominating city streets is in the realm of what Steven Johnson calls the “adjacent possible.” Uber just made headlines by purchasing a large chunk of Carnegie Melon’s robotics department.

CONTINUE READING

A friend recently emailed a group of us to say that his opinion is indeed backed up by data: Star Wars Episode 3, “Revenge of the Sith”, is better than Episode 6, “Return of the Jedi.” Like most right-headed people, I disagree. While I am cautiously optimistic about Episode 7, I have not truly loved a Star Wars movie since the originals. And as it turns out, many of the millions of people on Rotten Tomatoes agree:

CONTINUE READING

Somerville, MA has been fighting a war against rats for months, and now we have the data to show that it’s working: reported sightings have dropped 66% year-to-date; some of that is due to weather patterns and random fluctuation, but a Bayesian model of the data estimates that the City’s policies have reduced calls by 40%. Three years ago, the city where I work was dealing with an onslaught of rats.

CONTINUE READING

Here’s a problem governments are faced with every day: you have a limited amount of resources to maintain aging infrastructure, in this case streets. Do you spend more on crack sealing and preventive maintenance, or full depth reclamation? Which streets should you fix first? I am not an engineer (in fact, part of the reason I am writing this post is to get feedback from engineers); but I have thought a lot about this, and I think I have a decent method for prioritizing roadway repairs that anyone could implement using the open-source program R.

CONTINUE READING

When I first started as an analyst in local government, I wasted a lot of time repeating tasks that had been done dozens of times before in Excel. SomerStat, the office where I worked and later became director, is one of the oldest local government divisions dedicated to crunching data. Inspired by the CitiStat model, which itself was inspired by CompStat, the idea was to use data to improve efficiency. And yet here I was, with fairly inefficient work routines that included pulling data into spreadsheets, munging one step at a time, and then repeating it all for the next ‘stat’ meeting.

CONTINUE READING