Networks: How I learned to stop worrying and love the hairball

Do a Google image search for data visualization and undoubtedly you will see many examples of networks, otherwise known as graphs. The identification and study of these networks is useful in a variety of fields from social network analysis in sociology and social informatics to the study of predation networks in ecology. If you can identify connections between groups of entities, then you can study it using some aspect of network theory. However, the visual representations of these networks as graphs are often difficult to interpret. This post intends to shed some light onto the topic of network visualizations.

Essentially, networks are data structures that represent relationships between entities. For example, Author A writes an article with Author B. Obviously in this case the authors are the entities and are connected through their co-authoring relationship. Graphs consist of nodes (entities) and edges (relationships that connect the entities). We might visually represent the previous example as:

Edges will either be directed (unidirectional or bidirectional) or undirected. Directed edges represent directionality in the relationship between the entities they connect. For example, Author A sends Author B an email:

Perhaps Author B responds, in which case the edge becomes bidirectional:

Edges can also have a weight, which represents a numerical attribute of the relationship. Weight is visually depicted by the thickness of an edge proportional some numerical attribute of that edge. For example, maybe Author A and Author B communicate over email a total of 10 times:

Nodes have degree. If the network is undirected, a node’s degree is the total number of edges connected to that node:

If the network is directed, then nodes will have an in-degree and an out-degree. In-degree corresponds to the number of incoming edges and out-degree corresponds to the number of outgoing edges:

As networks grow and become larger the number of edges will increase. Graph density corresponds to the number of edges relative to the number of nodes. As graph density increases it can result in what some people refer to as a hairball. Most network visualization programs, such as Gephi, will provide layouts to handle different types of graphs, large and small or dense and sparse. To a large extent, these layouts will determine how the graphs are interpreted, e.g. the meaning behind the distance between nodes or the more ‘influential’ nodes being placed at the center of the graph. Despite their complex nature much can be learned from very dense graphs when appropriate layouts are applied:

With the graph above we might be studying some macro-level phenomenon and be perfectly happy to see how large clusters of nodes are grouped together. However, if we are interested in more granular aspects of relationship between the nodes, then this graph will not serve our purposes. As with most research, the questions asked at the beginning will guide the various decisions made along the way, including layout selection. Hopefully, this post has slightly demystified the network visualization.

If you are interested in learning more about this topic, I will be teaching an introduction to network visualization with Gephi this fall as part of a digital humanities workshop series. More details to come.

-TP

Blog Categories

Digital Scholarship Blogs