Recently I was asked to help investigate the performance of a fancy bit of hardware. The device in question was an xCelor XPM3, an ultra low-latency Layer 1 switch. Layer 1 switches are often used by trading firms to replicate data from one source out to many other sources. The exciting thing about these kinds of switches is that they can take network packets in one port and redirect them back out another port in 3 billionths of a second. That is fast. This may be no surprise, but that is so fast that it is pretty hard to measure.
To measure something in nanoseconds, billionths of a second, you need some equally exotic gear. I happened to have an FPGA based packet capture card with a clock disciplined by a high-end GPS receiver, some optical taps, and a pile of Twinax cables. Oh boy, let the fun begin. Even with toys like these, the minimum resolution of my packet capture system was 8 nanoseconds, nearly 3 times slower than the XPM3 can move packets. To get around this problem, I replicated each packet through every port on the XPM3, bouncing it all the way down the switch and back. Physically this meant that every port was diagonally connected with Twinax cables like this:
And inside the XPM3 it was moving data between ports like this:
The problem now is that I have two variables. Sending a packet down the switch this way means that it moves through 32 replication ports (r) and 30 Twinax cables (t). After running 10 million packets through this test setup, I knew that 32r + 30t + 35 = 212.93259 nanoseconds on average. The ‘35’ is the number of nanoseconds it took for the packet capture system to timestamp the arriving packets. But how could I determine the time for replication and the time in the Twinax cables? The answer was to get a second equation so that I could substitute variables.
I ran a second trial using just ports 1-8 instead of the full 32 ports. This gave me 8r + 6t + 35 = 75.958503 nanoseconds. Now with two variables and two equations I could simply substitute them to calculate that a replication port took 3.35 nanoseconds per hop and Twinax cables took 2.34 nanoseconds for each .5-meter length.
The Chicago based bike sharing company, Divvy, hosted a contest this past winter. They released anonymous ride data on over 750,000 rides taken in 2013. The contest had several categories to see who could draw the most meaning from these data and who could design the most beautiful representation of the rides. I entered the contest as a way to learn about D3.js, a new data visualization tool that is amazingly powerful. And complicated.
I thought it would be fun to see where most people were coming from and going to. When I start a play-project like this, I reach for my two favorite data analysis machetes, Postgres and Python. Cleaning and loading the data into Postgres was pretty straight forward, which lead to the fun part, trying to derive a meaningful framework with which to examine the ride data.
Pretty quickly it became apparent that breaking down the day into small time slices and aggregating the top departure points would yield interesting insights. It became even more interesting when you categorize the top departure points by their corresponding top destinations.
At different times of the day the pattern of rides looks wildly different. Early in the morning a massive influx of riders use Divvy bikes near the citys two main train stations. In the middle of the day, bike usage centers around the primary tourist attractions with everyone coming and going to the same places. And in the small hours of the morning the bikes serve as cab replacements in the neighborhoods with lots of bars.
With the ride data extracted, I used D3 to make it beautiful. D3 allows shapes to move and change color in seemingly magical ways inside a web browser. Each departure point can be linked to its top destinations and they will arrange themselves. Crain’s Chicago Business newspaper saw my entry and is running a special print edition of the graphic in an upcoming paper. You can see the online edition here.
In 2008 the property bubble burst in Chicago. It is hard to gauge a recession without some hard numbers. In this case a visual representation gives a powerful view into the scale of the decline in building activity, measured by the total value of building permits by large builders. The visual gives a reference for the scale of the recession. A big thank you to the Chicago’s Open Data Portal for providing the data to work with.
The Data Portal has all the Chicago building permits available online which are a great metric for building activity. I narrowed down the permits to construction activity (elevator repair and fire alarm systems didn’t count) and used Python and Gephi to graph out the connections. Take a look at the result:
Yearly building activity of the largest builders in Chicago, 2006-2012
It was important to filter out smaller builders to have a clear image. The threshold for a builder to make the graph was at least 100 building permits or total permit value of over two million dollars. Each year is scaled to the total value of the building permits for that year, ranging from $8.3 billion in 2006 down to $752 million in 2009 and back up to $4 billion in 2011. Look what happened to John C. Hanna’s activity. In 2006 Hanna’s firm was the most active by properties. In 2007 and 2008 the activity was significantly reduced and failed to make the graph in 2009. By 2010 Hanna was back on the graph and by 2011 was growing again.
If you would like the higher resolution version or a PDF of the image, contact me.