Wednesday, December 05, 2012

Playing with D3.js and Beer

For months -- years? yeah years -- I've been telling myself I would learn D3.js. It's just so cool. I dabbled a bit in the past. (Dabbling is perhaps an overstatement. Whatever is smaller than a dabble, that's what I did.) I'm lazy. I'm busy. I'd rather watch an episode of Veronica Mars for the eighth time.

But recently, I finally found something that could motivate me to sit down and play with D3.js:

Some of my homebrew brown ale


Yup. Beer.

Data visualization and beer are two of my favorite things (plus writing plus running), so it makes sense. With the help of Scott Murray's excellent tutorials and a good strain of Brett, I got to work.

Various people have looked at breweries per capita by state in the U.S., but I wanted to show where the quality beer is concentrated. You know, beer you write songs about. Beer you want to share with your best friends. Beer that inspires you to learn some code. So I used data* from the Great American Beer Festival released on the number of medal winners at October's 2012 event. (Sadly, I missed it...) Basic question: Which state has the most, best beer?

Here's my first dataviz with D3, showing the total number of medals won in each state [or view here]:

Main takeaway: California rises above the rest when you look at sheer beer medals won. The other big winners are Colorado and Oregon (my home and my home state), which isn't a surprise if you know your beer.

Ok, it's ugly. It's not that informative. It doesn't have proper labels yet. It has this funky color scheme that doesn't really help you understand the data. There are lots of things I still need to figure out how to do (how do I make the labels for the "zero" states display in black instead of white? how do I make it so when I mouse-over one state, only that state changes from a label to a value? etc.). But bottom line is, the power of this tool is starting to emerge.


From here, I thought, not every state has the same number of breweries. So we still aren't looking at the density of good beer. And here's where I started to realize the beauty of D3. I created the visualization above to read the information from a CSV table. Now I wanted to add a new column to that table - breweries per state - and I could.

Are you impressed yet? Ok, stay with me...

I found that information** from the Brewers Association. Unfortunately, the Brewers Association only has data on the number of breweries per state through 2011. (They have the total breweries in the U.S. as of July 2012, but it isn't broken down by state.) So yes, you're right, there's a big flaw in the next piece of analysis I'm going to do. But remember, this is all a learning exercise so I can become a D3 wizard-in-training. (The upside is now I can go back and get the 2011 data from the Great American Beer Festival and redo my analysis - and in the process figure out a better way to extract/scrape data from a PDF.)

So. Found the breweries per state data, added it to my CSV file, and BOOM! I could redraw my chart so that it now displays the medals per brewery won in each sate. BAM! I only had to change two places in the code. (Values are coming soon - I need to figure out how to round the numbers so they look pretty.)

D3 viz number two, medals per brewery by state [or view here]:

Main takeaway: Wyoming breweries deliver with each earning roughly 2/3 of a medal. And Utah, my favorite unexpected underdog beer state, also cleans up. Uinta? Epic? Irresistible. 

Magic, huh? It's all through the power of scales, which are super nifty things in D3 you can read about here or here or here. Essentially, with D3 you don't have to figure out the scale. You can feed it new data and it will scale everything for you. Seeing as, when I make charts in Illustrator, I spend tons of time making sure the scales are accurate, this is pretty fantastic.

At this point, we're rolling, and doing basic visualization-for-exploration is easier (well, almost) than doing it in Excel. Promise. Because next, I downloaded some population data from the U.S. Census Bureau (again, 2011 data because they haven't released the 2012 ACS yet), and TADA!, now we can re-draw our bar chart to show medals won per 100,000 people. Cool.

D3 viz number three, medals per capita by state [or view here]:

Main takeaway: Wyoming still kills it. But Colorado and Oregon aren't too shabby.

And that's as far as I got. How long did this take me? Even with the steep-as-the-backside-of-Hope-Pass learning curve of D3, and even with me not actually knowing much about Javascript, this was a few hours sprinkled throughout one weekend. The most time was *still* spent finding and liberating the data, although my D3 noobishness evened the score a bit.

Future projects:
  • Figuring out those tweaks and little touches that will make these charts actually useful to someone.
  • D3 is known for having slick as hell transitions and animations. I want to make my charts morph into each other!
  • With D3, you can make maps. It's a natural fit for this dataset...
  • Wouldn't it be cool to be able to search for the nearest medal winner, based on where you are?
  • One year doesn't really give you an accurate picture of the beer scene, because judges are fickle and new breweries pop up all the time. It would be fun to compile several years of GABF data.
  • Not all of the categories have the same number of entries. Some are packed (everyone wants to submit an IPA) and some are relatively esoteric. You could weight the medals by the relative steepness of the competition.
And that's just the beginning. I've got a hunch beer data is exciting enough to keep me motivated.

Public Service Announcement: Hug a bar chart
I recently put together a speed-tutorial on creating bar charts in Fusion Tables for a work event. What I told people was, "Never underestimate the power of the bar chart. It's the workhorse of the data visualization world." I just want to give another shout out to bar charts, because really, they are so simple yet so flexible and powerful. So, go out and hug -- er, create one today.

*Both the GABF and the Brewer Association published their data in PDFs. Yup. But the story of getting it out is a blog post for another day...
**Brewers Association lists 1989 total breweries in 2011, but when I added the state numbers the total was 1987. Maybe the missing 2 are in Puerto Rico?