1 Network Visualization

Libraries

The following code will check whether the required packages have been installed. If not, it will automatically install them from CRAN.

pkgs <- c(
  "igraph",
  "ggraph",
  "tidygraph",
  "networkD3",
  "heatmaply",
  "dendextend",
  "circlize"
)

missing_pkgs <- pkgs[!(pkgs %in% installed.packages()[, "Package"])]

if (length(missing_pkgs) > 0) {
  install.packages(missing_pkgs)
}

Network diagrams (also called Graphs) show interconnections between a set of entities. Each entity is represented as a Node (or vertex) while connections between nodes are represented through links (or edges).

1.1 Social Network and 2-d visulizations

We have demonstrated some visualizations on a text network constructed by co-occurrence matrix in the section of text visualization, where each node is a main character in the book of Pride and Prejudice and each edge represents the co-occurrence of two characters in the same sentence. As this network represents the relationships among a group of people, it is a prefect example of social network.

Let’s re-cap that network quickly.

1.1.1 load the co-occurrence matrix as a network

book_cooc <- readRDS(file = "./data/book_cooc.RDS")
book_cooc[1:5,1:5]

##           mrbennet mrbingley elizabeth jane lydia
## mrbennet        77         4         0    0     0
## mrbingley        4       189        28   23     0
## elizabeth        0        28       698   74    16
## jane             0        23        74  255     7
## lydia            0         0        16    7   131

In the above co-occurrence matrix, the numbers represents the number of sentences that contain the word in column and the word in row at the same time. For example, there are 74 sentences that contain both elizabeth and jane at the same time.

Note that this is only a fraction of the matrix.

Now we treat each character as a node and connect them by thier co-occurrence as just in the text visualization section. The following two graphs should be familiar to you.

library(igraph)

## 
## Attaching package: 'igraph'

## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum

## The following object is masked from 'package:base':
## 
##     union

book_graphNetwork <- graph_from_adjacency_matrix(book_cooc,mode = "undirected", diag = FALSE)
plot(book_graphNetwork)

cooc_min <- 4
book_cooc[book_cooc<4] <- 0
book_graphNetwork <- graph_from_adjacency_matrix(book_cooc,mode = "undirected", diag = FALSE,weighted = TRUE)
plot(book_graphNetwork,edge.width=E(book_graphNetwork)$weight/5) # divide the weight by 5 for better visualization. you can try it dividing 5.

Or we can use ggraph for fine tuned visualization

# Plot it with ggraph
library(ggraph)

## Loading required package: ggplot2

ggraph(book_graphNetwork, layout="fr") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="#69b3a2", size=3) +
  #scale_edge_width(range=c(1,3)) +
  geom_node_text( aes(label=name), repel = TRUE, size=4, color="#69b3a2") +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

1.1.2 Layout Matters!

The layout of a network refers to the way how do we present the network, eg, how do we decide the location of nodes and the distance between them?

We will use the Highschool dataset included in the package ggraph. The dataset shows the change of friendship among a group of high school students from 1957 to 1958. The dataset is stores as edge list as follow:

head(highschool)

##   from to year
## 1    1 14 1957
## 2    1 15 1957
## 3    1 21 1957
## 4    1 54 1957
## 5    1 55 1957
## 6    2 21 1957

1.1.2.1 layout igraph randomly

Let’s start with a random layout.

ggraph(highschool,layout="igraph",algorithm="randomly") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  #geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )+
 facet_wrap(~year,ncol=2)

1.1.2.2 layout fr

The layout Fruchterman-Reingold is a force-directed layout algorithm. The idea of a force directed layout algorithm is to consider a force between any two nodes. In this algorithm, the nodes are represented by steel rings and the edges are springs between them. The attractive force is analogous to the spring force and the repulsive force is analogous to the electrical force. The basic idea is to minimize the energy of the system by moving the nodes and changing the forces between them.

This layout is usually useful for visualizing very large undirected networks.

# Plot it
ggraph(highschool,layout="fr") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  #geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )+
 facet_wrap(~year,ncol=2)

1.1.2.3 layout kk

layout kamada kawai is another force based algorithm that performs very well for connected graphs, but it gives poor results for unconnected ones. Due to

ggraph(highschool,layout="kk") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  #geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )+
 facet_wrap(~year,ncol=2)

1.1.2.4 layout Drl

DrL is another force-directed graph layout toolbox focused on real-world large-scale graphs, developed by Shawn Martin and colleagues at Sandia National Laboratories.

ggraph(highschool,layout="drl") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  #geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )+
 facet_wrap(~year,ncol=2)

1.1.2.5 layout sphere

layout sphere places the vertices (approximately) uniformly on the surface of a sphere, this is thus a 3d layout. The benefit of using this here is very clear. The location of each student is relatively fixed so that we can compare easily between two years.

ggraph(highschool,layout="sphere") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  #geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )+
 facet_wrap(~year,ncol=2)

1.1.2.6 combined plot

Or we can directly plot those two network in one plot as the nodes are the same. And this time, we will plt four layouts and try to decide which one make more sense by just comparing them.

Although there is no such thing as “the best layout algorithm” as algorithms have been optimized for different scenarios. Experiment with them and choose the one that is “salty” is sometime helpful!

library(tidygraph)

## 
## Attaching package: 'tidygraph'

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

graph <- as_tbl_graph(highschool) %>% 
  mutate(degree = centrality_degree())
lapply(c('stress', 'fr', 'lgl', 'graphopt'), function(layout) {
  ggraph(graph, layout = layout) + 
    geom_edge_link(aes(colour = factor(year)), show.legend = FALSE) +
    geom_node_point() + 
    labs(caption = paste0('Layout: ', layout))
})

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

There are other layout avaliable in ggraph, more detail can be found here: https://www.rdocumentation.org/packages/igraph/versions/0.7.1/topics/layout

1.1.3 Dependecy network (Course Structure example)

In the above examples, nodes are of the same type all the time. However, network can be constructed in many ways! For example, we can build a network to represent the dependecy between different level of nodes. This is very similar to a cluter data.

The Course Structure data (course.csv) describe the structure of courses included the Master of Statistics Programs (Mathematical Statistics, Economic Statistics, and Epidemiology and Health Statistics) from Renmin Univeristy of China. There are two type of nodes, some of them are courses and the others are programs. They are connected if a course belong to a program.

We visualize this network as an iterative network so that the clutering pattern is highlighted if you hover on the program node. At the same time, the overlapping course between programs are also clear.

library(networkD3)
course<-read.csv("data/course.csv", stringsAsFactors = FALSE)

simpleNetwork(course,     
        Source=1,                 # column number of source
        Target=2,                 # column number of target
        height=880,               # height of frame area in pixels
        width=1000,
        linkDistance=70,         # distance between node. Increase this value to have more space between nodes
        charge=-30,              # numeric value indicating either the strength of the node repulsion (negative value) or attraction (positive value)
        fontSize=8,               # size of the node names
        fontFamily="serif",       # font og node names
        linkColour="#666",        # colour of edges, MUST be a common colour for the whole graph
        nodeColour="#69b3a2",      # colour of nodes, MUST be a common colour for the whole graph
        opacity=0.9,              # opacity of nodes. 0=transparent. 1=no transparency
        zoom=T                    # Can you zoom on the figure?
)

If you want to color the nodes, you will have to use a more complicated function: forceNetwork. forceNetwork provides more aruguemnts so that you can fine tuning your plot.

# make a nodes data frame out of all unique nodes
nodes <- data.frame(name = unique(c(course$from, course$to)))

# make a group variable where nodes in course$from are identified
nodes$group <- nodes$name %in% course$from

links <- data.frame(source = match(course$from, nodes$name) - 1,
                    target = match(course$to, nodes$name) - 1)
forceNetwork(Links = links,
             Nodes = nodes,
             Source = "source",
             Target = "target", 
             NodeID ="name", 
             Group = "group",
             opacity = 1, 
             opacityNoHover = 1,
             linkDistance=70,
             fontSize=8,
             fontFamily="serif",
             zoom=T)

1.2 Heatmap

Another way to visualize the network is visualize the adjacency matrix directly!

library(heatmaply)

## Loading required package: plotly

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:igraph':
## 
##     groups

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

## Loading required package: viridis

## Loading required package: viridisLite

## 
## ======================
## Welcome to heatmaply version 1.2.1
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================

## 
## Attaching package: 'heatmaply'

## The following object is masked from 'package:igraph':
## 
##     normalize

1.2.1 unweighted heatmap

heatmaply(book_cooc,
        dendrogram = "both",
        xlab = "", ylab = "", 
        main = "",
        scale = "none",
        margins = c(60,100,40,20),
        grid_color = "white",
        grid_width = 0.0000000001,
        titleX = FALSE,
        hide_colorbar = TRUE,
        branches_lwd = 0.1,
        label_names = c("Name", "With:", "Value"),
        fontsize_row = 7, fontsize_col = 7,
        labCol = colnames(book_cooc),
        labRow = rownames(book_cooc),
        heatmap_layers = theme(axis.line=element_blank())
        )

1.2.2 weighted heatmap

heatmaply(book_cooc, 
      dendrogram = "none",
      xlab = "", ylab = "", 
      main = "",
      scale = "column",
      margins = c(60,100,40,20),
      grid_color = "white",
      grid_width = 0.00001,
      titleX = FALSE,
      hide_colorbar = TRUE,
      branches_lwd = 0.1,
      label_names = c("From", "To:", "Value"),
      fontsize_row = 7, fontsize_col = 7,
      labCol = colnames(book_cooc),
      labRow = rownames(book_cooc),
      heatmap_layers = theme(axis.line=element_blank())
      )

Or we can even visulize the matrix as another layout using ggraph.

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_tile(mirror = TRUE) + 
  coord_fixed()

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_point() + 
  coord_fixed()

ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) + 
  geom_edge_bend() + 
  coord_fixed()

This visualization is not only straightford. As many network layouts suffer from poor scalability, where edges will eventually begin to overlap to the extend that the plot becomes unintellible, visualizing it as matrix avoids overlapping edges completely.

But at the same time, this visualization shows very different pattern compared to topology plot. Besides, the node order now has a big influence on the look of the plot.

1.3 Edge Bundling

One question: Remember that we visualized the clusters in the text visualization chapter. Can we visualize the clustering information and the network connections in one plot??

Short answer: we can use Edge Bundling. Edge Bundling allows to visualize adjacency relations between entities organized in a hierarchy. The idea is to bundle the adjacency edges together to decrease the clutter usually observed in complex networks.

1.3.1 Edge Bundling (Pride and Prejudice)

book_cluster <- readRDS(file = "./data/book_cluster.RDS")
den_hc <- as.dendrogram(book_cluster)
  
ggraph(den_hc, layout = 'dendrogram', circular = TRUE) + 
  geom_edge_link(alpha=0.8) +
  geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
  coord_fixed() +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(c(0,0,0,0),"cm"),
  ) +
  expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

library(dendextend)

## 
## ---------------------
## Welcome to dendextend version 1.15.1
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## Or contact: <tal.galili@gmail.com>
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------

## 
## Attaching package: 'dendextend'

## The following object is masked from 'package:stats':
## 
##     cutree

book_edge = as_edgelist(book_graphNetwork)
# The connection object must refer to the ids of the leaves:
from=match(book_edge[,1],get_nodes_attr(den_hc,"label"))
to=match(book_edge[,2],get_nodes_attr(den_hc,"label"))

# Make the plot
ggraph(den_hc,layout='dendrogram',circular=TRUE)+ 
  geom_edge_link(alpha=0.3) +
  geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.8, colour="#69b3a2") + 
  geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
  coord_fixed() +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(c(0,0,0,0),"cm"),
  ) +
  expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

1.3.2 Edge Bundling (flare)

edges=flare$edges
head(edges)

##                      from                                           to
## 1 flare.analytics.cluster flare.analytics.cluster.AgglomerativeCluster
## 2 flare.analytics.cluster   flare.analytics.cluster.CommunityStructure
## 3 flare.analytics.cluster  flare.analytics.cluster.HierarchicalCluster
## 4 flare.analytics.cluster            flare.analytics.cluster.MergeEdge
## 5   flare.analytics.graph  flare.analytics.graph.BetweennessCentrality
## 6   flare.analytics.graph           flare.analytics.graph.LinkDistance

vertices=flare$vertices%>%arrange(name)%>%mutate(name=factor(name,name))
head(vertices)

##                                           name size            shortName
## 1                                        flare    0                flare
## 2                              flare.analytics    0            analytics
## 3                      flare.analytics.cluster    0              cluster
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster
## 5   flare.analytics.cluster.CommunityStructure 3812   CommunityStructure
## 6  flare.analytics.cluster.HierarchicalCluster 6714  HierarchicalCluster

#Preparation to draw labels properly:
vertices$id=NA
myleaves=which(is.na(match(vertices$name,edges$from)))
nleaves=length(myleaves)

vertices$id[myleaves]=seq(1:nleaves)
vertices$angle=90-360*vertices$id/nleaves
vertices$hjust=ifelse(vertices$angle < -90, 1,0)
vertices$angle=ifelse(vertices$angle < -90,vertices$angle+180,vertices$angle)
head(vertices)

##                                           name size            shortName id
## 1                                        flare    0                flare NA
## 2                              flare.analytics    0            analytics NA
## 3                      flare.analytics.cluster    0              cluster NA
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster  1
## 5   flare.analytics.cluster.CommunityStructure 3812   CommunityStructure  2
## 6  flare.analytics.cluster.HierarchicalCluster 6714  HierarchicalCluster  3
##      angle hjust
## 1       NA    NA
## 2       NA    NA
## 3       NA    NA
## 4 88.36364     0
## 5 86.72727     0
## 6 85.09091     0

# Build a network object from this dataset:
mygraph=graph_from_data_frame(edges,vertices=vertices)

The clustering.

# Basic dendrogram
ggraph(mygraph,layout='dendrogram',circular=TRUE)+ 
    geom_edge_link(size=0.4,alpha=0.1)+
    geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle=angle-90,hjust=hjust),size=1.5,alpha=0.5) +
    coord_fixed() +
    theme_void() +
    theme(
      legend.position="none",
      plot.margin=unit(c(0,0,0,0),"cm"),
    ) +
    expand_limits(x=c(-1.2, 1.2),y=c(-1.2, 1.2))

## Warning: Ignoring unknown parameters: edge_size

The network with clustering information.

connections=flare$imports
# The connection object must refer to the ids of the leaves:
from=match(connections$from,vertices$name)
to=match(connections$to,vertices$name)

# Make the plot
ggraph(mygraph,layout='dendrogram',circular=TRUE)+ 
    geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.1, colour="#69b3a2") + 
    geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle = angle-90,hjust=hjust),size=1.5,alpha=1) +
    coord_fixed()+
    theme_void()+
    theme(
      legend.position="none",
      plot.margin=unit(c(0,0,0,0),"cm"),
    ) +
    expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))

1.4 Arc Diagram

In arc diagrams, nodes are displayed along a single axis and links are represented with arcs.

Compared to the 2-d visualization presented above, it displays the label of each node clearly, which is often impossible in 2d structure. Another merit for using Arc diagram is that it can utilize the clustering information if the node order is chosed wisely.

ggraph(book_graphNetwork,layout="linear")+
  geom_edge_arc(aes(width=weight/60,color=factor(from)),alpha=0.6,show.legend=F) +
  geom_node_text(aes(label=name),repel=F, size=4.5,angle = 320)+
  guides(fill=F)+
  theme_graph()

1.5 Hive Diagram

An extension to arc diagrams is the hive plot, where instead of the nodes being laid out along a single one-dimensional axis they are laid out along multiple axes. This can help reveal more complex clusters (if the nodes represent connected people, imagine for example laying out nodes along axes of both “income” and “enthicity”).

Here’s an example of a hive plot on the pride and prejudice network:

graph <- as_tbl_graph(book_graphNetwork) %>% 
  mutate(degree = centrality_degree())
age=c("old","young","young","young","young",
          "old","kid","kid","young","young","young",
          "young","old","young","young","old","old")
ggraph(graph, 'hive', axis = age) + 
  geom_edge_hive(colour = 12,label_colour = 2) + 
  geom_axis_hive(aes(colour = age), size = 2, label = FALSE) + 
  geom_node_label(aes(label=name),repel=F, size=2.5) + 
  coord_fixed()

And it is a particularly useful way of visualizing graphs with many nodes and edges that look like a dense “hairball” using traditional graph layouts. Thus, we can plot the hive diagram on the highschool dataset based on their number of friends:

highschool_graph <- as_tbl_graph(highschool) %>% 
  mutate(degree = centrality_degree())

highschool_graph <- highschool_graph %>% 
  mutate(friends = ifelse(
    centrality_degree(mode = 'in') < 5, 'few',
    ifelse(centrality_degree(mode = 'in') >= 15, 'many', 'medium')
  ))
ggraph(highschool_graph, 'hive', axis = friends, sort.by = degree) + 
  geom_edge_hive(aes(colour = factor(year))) + 
  geom_axis_hive(aes(colour = friends), size = 2, label = FALSE) + 
  coord_fixed()

Please note that the inter-connection between node on each axis are ignored in the visualization.

1.6 Flow diagram: Sankey Diagram and Chord Diagram

Flow diagram is a collective term for a diagram representing a flow or set of dynamic relationships in a system. Some of those flow diagram are actually very helpful in visualizing the network or network-like dataset.

We will use the 1960 - 1970 population migration data, which displays the number of people migrating from one country to another. Data used comes from this publication: https://onlinelibrary.wiley.com/doi/abs/10.1111/imre.12327

1.6.1 Sankey Diagram

Sankey diagrams are a type of flow diagram in which the width of the arrows is proportional to the flow rate.

Sankey diagrams can also visualize the energy accounts, material flow accounts on a regional or national level, and cost breakdowns. Itemphasize the major transfers or flows within a system. They help locate the most important contributions to a flow. They often show conserved quantities within defined system boundaries.

We will use our old friend, R package networkD3 here.

# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v tibble  3.1.2     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## v purrr   0.3.4

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## x purrr::compose()       masks igraph::compose()
## x tidyr::crossing()      masks igraph::crossing()
## x dplyr::filter()        masks plotly::filter(), tidygraph::filter(), stats::filter()
## x dplyr::groups()        masks plotly::groups(), tidygraph::groups(), igraph::groups()
## x dplyr::lag()           masks stats::lag()
## x purrr::simplify()      masks igraph::simplify()

# I need a long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname) %>%
  filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")

# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
 
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1 
data_long$IDtarget=match(data_long$target, nodes$name)-1

# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'

# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
                     Source = "IDsource", Target = "IDtarget",
                     Value = "value", NodeID = "name", 
                     sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)

For non-interactive Sankey plot, one can use R package riverplot.

1.6.2 Chord Diagram

A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data are arranged radially around a circle with the relationships between the data points typically drawn as arcs connecting the data.

The format can be aesthetically pleasing, making it a popular choice in the world of data visualization.

The primary use of chord diagrams is to show the flows or connections between several entities (called nodes). Each entity is represented by a fragment (often colored or pattered) along the circumference of the circle. Arcs are drawn between entities to show flows (and exchanges in economics). The thickness of the arc is proportional to the significance of the flow.

We will use the R package circlize here:

library(circlize)

## ========================================
## circlize version 0.4.12
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================

## 
## Attaching package: 'circlize'

## The following object is masked from 'package:igraph':
## 
##     degree

# short names
colnames(data) <- c("Africa", "East Asia", "Europe", "Latin Ame.",   "North Ame.",   "Oceania", "South Asia", "South East Asia", "Soviet Union", "West.Asia")
rownames(data) <- colnames(data)

# I need a long format
data_long <- data %>%
  rownames_to_column %>%
  gather(key = 'key', value = 'value', -rowname)

# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, track.margin = c(-0.1, 0.1), points.overflow.warning = FALSE)
par(mar = rep(0, 4))

# color palette
mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
mycolor <- mycolor[sample(1:10)]



# Base plot
chordDiagram(
  x = data_long, 
  grid.col = mycolor,
  transparency = 0.25,
  directional = 1,
  direction.type = c("arrows", "diffHeight"), 
  diffHeight  = -0.04,
  annotationTrack = "grid", 
  annotationTrackHeight = c(0.05, 0.1),
  link.arr.type = "big.arrow", 
  link.sort = TRUE, 
  link.largest.ontop = TRUE)

## Note: The second link end is drawn out of sector 'Europe'.

## Note: The second link end is drawn out of sector 'South East Asia'.

# Add text and axis
circos.trackPlotRegion(
  track.index = 1, 
  bg.border = NA, 
  panel.fun = function(x, y) {
    
    xlim = get.cell.meta.data("xlim")
    sector.index = get.cell.meta.data("sector.index")
    
    # Add names to the sector. 
    circos.text(
      x = mean(xlim), 
      y = 3.2, 
      labels = sector.index, 
      facing = "bending", 
      cex = 0.8
      )

    # Add graduation on axis
    circos.axis(
      h = "top", 
      major.at = seq(from = 0, to = xlim[2], by = ifelse(test = xlim[2]>10, yes = 2, no = 1)), 
      minor.ticks = 1, 
      major.tick.length = 0.5,
      labels.niceFacing = FALSE)
  }
)

1.7 Additional Examples

Package

library(ggraph)
library(igraph)
library(networkD3)

1.7.1 Friendship Network

course<-read.csv("data/course.csv")
networkfriend<-read.csv("data/network.csv")
networkfriend<-as.data.frame(networkfriend)

Now we try the network.csv dataset. It is about the relationships among a group of people.

#rownames
rownames(networkfriend)<-colnames(networkfriend)
#transfer to adjacency matrix
networkfriend<-as.matrix(networkfriend)
head(networkfriend)

##       Joe Judy Tom Sarah Alice Emily Bob Tim Peter Helen Jack Kate Jane Jim
## Joe     0    1   1     0     1     1   0   1     0     0    1    0    0   1
## Judy    1    0   0     1     0     0   1   0     0     0    0    0    0   0
## Tom     1    0   0     0     1     0   0   0     0     0    0    0    0   0
## Sarah   0    1   0     0     0     0   0   0     1     1    0    0    0   0
## Alice   1    0   1     0     0     0   0   0     0     0    0    0    0   0
## Emily   1    0   0     0     0     0   0   0     0     1    0    0    0   0
##       Alma Amy Anna Bella Cherry Daisy Ella Grace Jenny Jessica Kitty Linda
## Joe      0   0    0     1      0     0    0     1     0       0     1     0
## Judy     0   0    0     0      0     0    0     0     0       0     0     0
## Tom      0   1    0     0      0     1    0     0     0       0     0     0
## Sarah    0   0    0     0      0     0    0     0     0       1     0     1
## Alice    0   0    0     0      0     0    0     0     1       0     0     0
## Emily    0   0    0     0      0     0    1     0     0       0     0     0
##       Lisa Mary Tina Ben Andrew Bill Carl David Frank Jason John Kim Mark Nick
## Joe      0    0    0   1      0    0    0     0     1     0    1   0    0    1
## Judy     0    0    0   0      0    0    0     0     0     1    0   0    0    0
## Tom      0    0    1   0      0    0    0     1     0     0    0   0    1    0
## Sarah    0    0    0   0      1    0    0     0     0     1    0   0    0    1
## Alice    0    0    0   0      0    0    0     0     0     0    0   0    0    0
## Emily    0    1    0   0      0    0    0     0     1     1    0   0    0    0
##       Steven Simon Robert Neil
## Joe        0     0      1    0
## Judy       1     0      0    0
## Tom        0     1      0    1
## Sarah      1     0      0    0
## Alice      0     0      0    0
## Emily      0     0      0    0

networkfriend=graph_from_adjacency_matrix(networkfriend, weighted=TRUE)

# Plot it
# Make the graph
ggraph(networkfriend, layout="fr") + 
  geom_edge_link(edge_colour="black", edge_alpha=0.2) +
  geom_node_point( color="#69b3a2", size=3) +
  #scale_edge_width(range=c(1,3)) +
  geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

ggraph(networkfriend, layout="kk") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

ggraph(networkfriend, layout="drl") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2",size=3) +
  #scale_edge_width(range=c(1,3)) +
  geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

ggraph(networkfriend, layout="igraph",algorithm="randomly") + 
  geom_edge_link(edge_colour="black",edge_alpha=0.2) +
  geom_node_point(color="#69b3a2", size=3) +
  #scale_edge_width(range=c(1,3)) +
  geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
  theme_void() +
  theme(
    legend.position="none",
    plot.margin=unit(rep(1,4), "cm")
  )

“fr” and “kk” each represent a force-directed algorithm. The third one is the result of the random arrangement of nodes. The first two layout algorithms clearly show the topological structure of social relationships. The third one is not desirable, which doesn’t show the importance of node layout. It can be seen from the picture that the participants are mainly Joe’s friends. In addition, there are many people who know David, Emily, Nick, Judy, Sarah, Bob, Simon, Steven, Tom, Alma, Jane, Linda, Anna, Robert, John only met one of them.

Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically-pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.

1.7.2 Plot igraph Object

library(igraph)
library(ggraph)
library(networkD3)

Preparation We are using the the novel “长安十二时辰” as an example for quantitative relationship.

Use igraph to draw the relationship plot

edge <- read.table('data/12edge.txt',header=T,fileEncoding="UTF-16") # load edge
net <- graph_from_data_frame(d=edge, directed=FALSE) # create the network
plot(net)

plot(net, 
     layout=layout_with_fr) # Fruchterman-Reingold layout

plot(net, 
     layout=layout_nicely) # Optimal Layout

E(net)$weight <- edge$relation #create the weight
plot(net, 
     layout=layout_nicely, 
     vertex.shape="none") #delete node

plot(net, 
     layout=layout_nicely, 
     vertex.shape="none", 
     edge.width=E(net)$weight) #use weight to represent the edge

plot(net, 
     layout=layout_nicely, 
     vertex.shape="none",
     edge.width=E(net)$weight/20) #adjust the weight

plot(net, 
     layout=layout_nicely, 
     vertex.shape="none",
     edge.width=E(net)$weight/20, 
     vertex.label.cex=0.4) #addjust the size of the label

plot(net, 
     layout=layout_nicely, 
     vertex.shape="none",
     edge.width=E(net)$weight/20, 
     vertex.label.cex=0.4, 
     vertex.label.color='black') #change the color of the label

1.7.3 Plot via ggraph

net <- graph_from_data_frame(d=edge, directed=FALSE) # undirected
ggraph(net) +
  geom_edge_link() +   # add edge
  geom_node_point()    # add node

## Using `stress` as default layout

ggraph(net, layout='graphopt')+
  geom_node_text(aes(label = name)) # adjust the layout and add label to avoid overlap

ggraph(net, layout='graphopt')+
  geom_node_label(aes(label = name)) #adjust the layout

ggraph(net, layout='graphopt')+
  geom_node_label(aes(label = name), repel=TRUE) # adjust layout to avoid the overlap

ggraph(net, layout='kk')+
  geom_node_label(aes(label = name)) ## use "kk" to adjust the layout

ggraph(net, layout='kk')+
  geom_node_label(aes(label = name)) +
  geom_edge_link() # add edge

ggraph(net, layout='kk')+
  geom_node_label(aes(label = name)) +
  geom_edge_link(alpha = 0.3, colour = '#377EB8') #Change the transparency and color of the edges

ggraph(net, layout='kk')+
  geom_node_label(aes(label = name)) +
  geom_edge_link(alpha = 0.3, colour = '377EB8') + 
  theme_graph(background = 'white')

E(net)$weight <- edge$relation
ggraph(net, layout='kk')+
  geom_node_label(aes(label = name)) +
  geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') + 
  theme_graph(background = 'white')# set relation as the width of edge

E(net)$weight <- edge$relation
ggraph(net, layout='graphopt')+
  geom_node_label(aes(label = name)) +
  geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') + 
  theme_graph(background = 'white') + 
  theme(legend.position = "none")#addjust the layout

ggraph(net, layout='graphopt')+
  geom_node_label(aes(label = name)) + 
  geom_edge_arc(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') + 
  theme_graph(background = 'white') + 
  theme(legend.position = "none")# chage the edge to ARC

ggraph(net, layout='linear', circular=TRUE)+
  geom_node_label(aes(label = name)) +
  geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') + 
  theme_graph(background = 'white') + 
  theme(legend.position = "none")

ggraph(net, layout='linear', circular=TRUE)+
  geom_node_label(aes(label = name)) +
  geom_edge_arc(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') + 
  theme_graph(background = 'white') + 
  theme(legend.position = "none")

#lab_size=as.vector(degree(net))
#lab_size=degree(net, normalized = TRUE) * 10 + 2
lab_size=c(7.0, 9.0, 8.5, 7.0, 9.0, 6.0, 12.5, 6.0, 11.5, 10.0, 5.0, 8.5, 8.0, 9.5, 6.5, 10.5, 7.5, 9.5, 8.5, 7.0, 3.0)
ggraph(net, layout='graphopt')+
  geom_node_label(aes(label = name), size=lab_size) +
  geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') + 
  theme_graph(background = 'white') + 
  theme(legend.position = "none")# Set the degree of the node to the size of the node

1.7.4 Interactive Visualization by networkD3

net <- graph_from_data_frame(d=edge, directed=FALSE) #undirected 

netD3 <- igraph_to_networkD3(net) # transfer the igraph to networkD3
netD3$nodes$group <- '1' # add group in the dataframe
#netD3$nodes$degree <- degree(net) # add degree in the data frame
netD3$nodes$degree <- c(10, 14, 13, 10, 14,  8, 21,  8, 19, 16,  6, 13, 12, 15,  9, 17, 11, 15, 13, 10,  2)
forceNetwork(Links = netD3$links, Nodes = netD3$nodes, 
             Source = 'source', Target = 'target', 
             NodeID = 'name', Group = 'group', 
             Value = 'value', 
             Nodesize = 'degree', 
             #fontFamily = '黑体', 
             charge = -100, 
             zoom = TRUE, 
             bounded = FALSE, 
             opacityNoHover = TRUE)

Ch11 Network Visualization

Descriptive Analytics and Data Visualization

Yichen Qin (qinyn@ucmail.uc.edu), University of Cincinnati

2021-08-16