Blog: Cleaning and analyzing data

Today I went through the process of cleaning data and turning it into simple visualizations. I chose to use a dataset about moving permits in Boston. Boston is a unique city because most lease cycles adapt to the school year calendar. I wanted to show the very clear increase in moving permits being granted right at the end of August to show the mayhem that time of year brings. This is a factoid story because the basis for doing this data analysis was knowing Boston’s school year-oriented lease cycle.

I used this data set from that includes the dates permits are issued and the dates they expire for moving to each town. I opened the data in OpenRefine to clean it. The data set was a little bit tricky to understand at first because it includes two categories for cities, Applicant City and City. The Applicant City category is the city in which someone is applying to move to the Boston area from.

The data was a bit of a mess, as the way permits are categorized isn’t by zip code necessarily but by city or neighborhoods. There was a ton of overlap in how places were categorized, so I picked a few Boston neighborhoods to try to find some patterns with.

Right now I’m just looking for general patterns, but if I were to take the time to properly clean this data, I could use my zip code datasheet to make sure the zip codes with each permit, matched with how the city or neighborhood was categorized by the moving permit data collectors in Boston. There’s going to be a margin of error, but this will at least give us a general sense if our factoid is correct.

I exported the data set from Open Refine to Excel and work in Pivot Tables to get to know the data a bit better. I looked at the labels to further understand what the data included. I then created a Pivot chart to track the most popular day/month people needed moving permits in Boston. Even in a preliminary Pivot chart, the uptick in August moves each year was easy to see.

Quick Pivot Chart that shows the upticks in moving permit expirations in August each year.

I then took the data and opened it in Tableau to create some visualizations to show this trend even further. Here is a dashboard I created to

I think this data is a great jumping-off point to show the chaos of a September 1 lease cycle. I would like to pair this data with data about the surge in the cost of movers during August/September.

If you move into Boston during late August or early September, movers are in high demand and it winds up being a lot more expensive. The price of a moving truck permit is $69 per day in unmetered spots in Boston and an extra $40 per day in a metered spot. This in addition to the higher fees to rent a truck or hire movers could end up making a costly college move even more costly. I’m curious how much money an individual would save by moving into Boston in July or October, the months before and after August/September. Would they save enough that it is worth moving in a month early?

Leave a Reply

Your email address will not be published. Required fields are marked *