We Can Do Magic: Transforming an Abstract Network Visualization into a London Tube Map

By Michael J. Stamper, M.F.A.
Data Visualization Designer and Consultant for the Arts
Spring – Summer 2018

Once in a while a project comes along that really lets you explore the creative side of data visualization and design. When this project came across my desk, the client’s first request was to take a simple dataset consisting of 21 colleagues (all from the same department) here at Virginia Tech, and visualize each individual’s connections to each other through their contributions to a defined set of shared tasks (first image below). Visualizing this kind of network is easy, and can be done using Gephi, a freeware network visualization program, that is described as “the Photoshop™ of network visualization.”

To get started, the initial data set was cleaned, organized, and formatted to be rendered by Gephi. After a few built-in algorithmic applications, manual stretches, color adjustments, node size tweaks, and renders, I had my first few drafts of a visualization that I could send over to my clients for review and feedback before the process was started all over again to get a more refined visualization. In my experience, a visualization is usually not completely refined and as good as it can be without a few iterations and adjustments…there’s a design process involved in creating these that goes beyond the initial request for a simple visualization, that we see in subsequent figures.

You can see the first draft (render) of this abstract network on the right. It’s very basic. The network is there, colors indicating a loose degree of a “Betweeness Centrality” algorithm that I applied (and manually adjusted for clarity), node size is there indicating number of connections to colleagues, and edges (the lines connecting nodes) have a weight assigned to them by the number of tasks between individuals.

The figure to the right, displays the individuals as they were in alphabetical order. This layout is simply showing us the range of node sizes, which as I mentioned before is based on number of connections to colleagues – not tasks. Basing the size on tasks would give us a result where some individuals with few connections to colleagues, would have a large node, which would give us a false visual reading, where these individuals seem more important than the individuals with many connections to colleagues. The parentheses after names indicate (#number of connections to colleagues) and (# of tasks).

A quick note about the labeling of tasks. Remember, there are five tasks; 1, 2, 3, 4, 5. You’ll see that the labels on the edges are not separated by commas. The reason behind this is because a comma separated value (CSV) file was used to render the edges and nodes in Gephi. If the label column had numerals separated by commas, then Gephi would get confused and create a visualization based off of all those numbers being separated edges, instead of assigned to a single edge, as seen here. This isn’t a problem to fix, in post processing using Adobe Illustrator™. You’ll also notice that nodes say “Person” now, instead of a name…they have names, but for this article, they need to be anonymized. You don’t need to see them anyway…this article is about visualization and design. 🙂

After this first render, my clients expressed that perhaps the straight Gephi visualization was still not as clear as they’d like, and that they were open to suggestions on how to refine it, be more engaging, and easier to use when communicating their work to their colleagues. I’ve always wanted to do a London Tube map-style visualization and this was the perfect opportunity to take my love for graphic, information, and transportation design (and London!) and apply them to a somewhat small network visualization.

I presented this concept to my clients, who were excited about the prospect and approved, so I got on my way and started researching how to transform my Gephi renders into a Tube Map, which you can see the successful results of right here.

Comparing the Gephi render with the Tube Map visualization, you’ll notice that node size is still used, tasks have been separated into their own lines (multiple tasks making up for the lost width in the Gephi render), and the visualization has been flattened, so it’s easier for users to see, trace, and engage with the visualization. Plus, it looks really cool, and made my clients very happy!

With this article, we’re seeing the transformation of raw data into a simple (although abstract) network visualization with the intersection of art, and graphic and interaction design. A place where I find most the work I do landing when it comes across my desk.

I’m always open, if you, your colleagues, or students would like to discuss any aspect of data visualization and design for your own data.

Thanks for reading!



































Viral Networks

by Nathaniel D. Porter
Social Science Data Consultant and Data Education Coordinator
Fall 2017 – Spring 2018

E. Thomas (Tom) Ewing (Professor of History – Virginia Tech) organized an innovative NIH/NEH-funded workshop bringing historians from around the country together to integrate network analysis into research on medical history. Although most participants had never previously used statistics or digital data analytics in their research, each participant created network data and visualization for chapters in a forthcoming book documenting not only the results but the challenges and triumphs of the process itself.

We assisted participants with data set creation and visualization. The initial phase combined online software training resources with individual consulting on building data from sources ranging from archival text materials to co-citation networks. In the later phase, we helped refine visualizations to make them clearer and more useful with hands-on design work on some of the more involved visualizations, including a geographic network of Alabama’s desegregation process and a topic network in letters between pioneering female physicians.

Overall, all workshop participants found that, with guidance, they could visualize their data in ways that helped communicate findings clearly. Moreover, in many cases, building and visualizing structured datasets clarified research questions or even inspired new lines of research.


Open Ways to Analyze Fertilizer Cost Trends

by Anne M. Brown, Ph.D.
Science Informatics Consultant and Health Analytics Coordinator
Spring 2017, recurring yearly

Professors in the Department of Agriculture and Applied Economics in the College of Agriculture and Life Sciences needed help accessing data that was originally publicly through the USDA ERS but is now only found behind a subscription paywall. Faculty, extension agents, and farmers need this information for profit and sustainability predictions. We were able to determine a protocol for finding this information from other available sources and synthesizing it into a clear, usable dataset that is updated regularly and understandable by a broad audience. We communicate with these individuals yearly as to provide updated information for the past year and look towards hosting this information in the future. This work and collaboration allowed us to provide data necessary to help a wide variety of clients and help our Virginia Farmers in the process. If you are looking to find data, synthesize it, or analyze it, please reach out to our informatics consultants as you start your projects.


Visualizing Networks for Research and Decision Making

By Michael J. Stamper, M.F.A.
Data Visualization Designer and Consultant for the Arts
Spring 2017

This project involved helping a graduate student in the College of Engineering here at Virginia Tech, who requested our assistance in cleaning and visualizing a very large dataset (6000+ points) that was in a multi-tabbed Excel spreadsheet.

Before visualizing this student’s data, it first needed to be “cleaned” and relationships between entities defined. To connect the data to hypotheses and to finalize her research project, she needed help with visualizing concepts, data organization, and concepting a design strategy to effectively communicate her findings and increase the impact of her research. Because the dataset was so large and unique, most of this work had to be done manually before setting up parameters for running the data through various visualization software. Working closely with the student, we determined the best software for visualizing the data was Gephi, a free, open-source program used for visualizing large datasets and creating abstract networks of clustered nodes and lines (edges).

Once we ran several iterations of her data through Gephi and made subsequent changes to groupings of node clusters, the “story” within her data emerged, as well as themes/sub-themes, and connections between clusters and nodes that supported her extensive research.

On the right, you’ll see a network visualization (with labels and other identifying information removed) showing clusters of themes, subthemes, and relationships. Created using Gephi, a freeware data visualization tool, and Adobe Illustrator.


Design Spaces for Veterans

by Chreston Miller, Ph.D.
Data and Informatics Consultant, Engineering
Spring 2017

This project investigated supporting veterans re-integrating back into civilian life. The focus was designing an application for them to aid in this process. The client was interested in creating personas based off of common criteria amongst a group of veterans. These personas would aid in design choices. This posed to be a intricate clustering problem. The client was trying to avoid grouping criteria from multiple veterans that were mutually exclusive, i.e., take some criteria from one veteran and some from another to create a realistic persona. So the desired result was criteria that a group of veterans had in common.
We supported the client through developing a unique algorithm that mined the client’s data for believable personas which aided in identifying groups of veterans with the same criteria. The algorithm performed a multi-dimensional intersection in which it identified veterans that shared a single criterium then performed an intersection of these criteria to group veterans that shared the same criteria.