Visualization of Citi Bike station node influence

My group's midterm is based on pruning "weak" stations and/or adding supplementary stations to the Citi Bike network in the hope of improving the system overall. The first step in this process is to identify weak stations that might be candidates for pruning, so I decided to make a visualization to get this information.

I used Gephi to create my visualization. Gephi worked best when I gave it separate data tables for nodes and edges. I loaded the Citi Bike station data (nodes) and trip data for February 2014 (edges) to form a graph.

One of the first things I realized was that the nodes could be easily divided into Manhattan stations and Brooklyn stations just by looking at the network's modularity (the strength of division of a network into modules). There are of course thousands of trips that start in Manhattan and end in Brooklyn, and vice versa, but the vast majority of trips start and end in the same borough. In the graphic below, Gephi grouped the nodes in this shape after I ran a modularity analysis (with a resolution of 2.0).


This means that pruning the weakest node overall might not affect the network as much as we had hoped it would. However, focusing on each module specifically is bound to have better results.

To determine the weakest nodes I looked at eigenvector centrality (Google's PageRank is a variant of this metric), which is "a measure of the influence a node." Connections to higher-scoring nodes are valued higher than those to lower-scoring nodes. The results (after 1,000 iterations) can be seen below. Gephi has several layout options to optimally display the network graph based on the chosen measurement (in this case eigenvector centrality). The Brooklyn visualization used a different layout for improved legibility.


Red nodes have higher scores and blue nodes have lower scores. The visualization makes it obvious which nodes are candidates for pruning; Railroad Ave & Kay Ave in Brooklyn and Leonard St & Church St in Manhattan are the lowest-scoring nodes in their respective boroughs.

This is accurate for trips made in February 2014. However to really get a sense of the all-time weakest nodes we would need to use all available data in these algorithms. There are likely seasonal changes in ridership that make certain stations less utilized in the winter.

No comments:

Post a Comment

Speak now...