Libraries
The following code will check whether the required packages have been installed. If not, it will automatically install them from CRAN.
pkgs <- c(
"igraph",
"ggraph",
"tidygraph",
"networkD3",
"heatmaply",
"dendextend",
"circlize"
)
missing_pkgs <- pkgs[!(pkgs %in% installed.packages()[, "Package"])]
if (length(missing_pkgs) > 0) {
install.packages(missing_pkgs)
}
Network diagrams (also called Graphs) show interconnections between a set of entities. Each entity is represented as a Node (or vertex) while connections between nodes are represented through links (or edges).
Another way to visualize the network is visualize the adjacency matrix directly!
library(heatmaply)
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:igraph':
##
## groups
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: viridis
## Loading required package: viridisLite
##
## ======================
## Welcome to heatmaply version 1.2.1
##
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
##
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## Or contact: <tal.galili@gmail.com>
## ======================
##
## Attaching package: 'heatmaply'
## The following object is masked from 'package:igraph':
##
## normalize
heatmaply(book_cooc,
dendrogram = "both",
xlab = "", ylab = "",
main = "",
scale = "none",
margins = c(60,100,40,20),
grid_color = "white",
grid_width = 0.0000000001,
titleX = FALSE,
hide_colorbar = TRUE,
branches_lwd = 0.1,
label_names = c("Name", "With:", "Value"),
fontsize_row = 7, fontsize_col = 7,
labCol = colnames(book_cooc),
labRow = rownames(book_cooc),
heatmap_layers = theme(axis.line=element_blank())
)
heatmaply(book_cooc,
dendrogram = "none",
xlab = "", ylab = "",
main = "",
scale = "column",
margins = c(60,100,40,20),
grid_color = "white",
grid_width = 0.00001,
titleX = FALSE,
hide_colorbar = TRUE,
branches_lwd = 0.1,
label_names = c("From", "To:", "Value"),
fontsize_row = 7, fontsize_col = 7,
labCol = colnames(book_cooc),
labRow = rownames(book_cooc),
heatmap_layers = theme(axis.line=element_blank())
)
Or we can even visulize the matrix as another layout using ggraph.
ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) +
geom_edge_tile(mirror = TRUE) +
coord_fixed()
ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) +
geom_edge_point() +
coord_fixed()
ggraph(book_cooc, 'matrix', sort.by = node_rank_leafsort()) +
geom_edge_bend() +
coord_fixed()
This visualization is not only straightford. As many network layouts suffer from poor scalability, where edges will eventually begin to overlap to the extend that the plot becomes unintellible, visualizing it as matrix avoids overlapping edges completely.
But at the same time, this visualization shows very different pattern compared to topology plot. Besides, the node order now has a big influence on the look of the plot.
One question: Remember that we visualized the clusters in the text visualization chapter. Can we visualize the clustering information and the network connections in one plot??
Short answer: we can use Edge Bundling. Edge Bundling allows to visualize adjacency relations between entities organized in a hierarchy. The idea is to bundle the adjacency edges together to decrease the clutter usually observed in complex networks.
book_cluster <- readRDS(file = "./data/book_cluster.RDS")
den_hc <- as.dendrogram(book_cluster)
ggraph(den_hc, layout = 'dendrogram', circular = TRUE) +
geom_edge_link(alpha=0.8) +
geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
coord_fixed() +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0),"cm"),
) +
expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.15.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## Or contact: <tal.galili@gmail.com>
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
## Attaching package: 'dendextend'
## The following object is masked from 'package:stats':
##
## cutree
book_edge = as_edgelist(book_graphNetwork)
# The connection object must refer to the ids of the leaves:
from=match(book_edge[,1],get_nodes_attr(den_hc,"label"))
to=match(book_edge[,2],get_nodes_attr(den_hc,"label"))
# Make the plot
ggraph(den_hc,layout='dendrogram',circular=TRUE)+
geom_edge_link(alpha=0.3) +
geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.8, colour="#69b3a2") +
geom_node_text(aes(x = x, y=y, filter = leaf, label=label), size=4, alpha=1) +
coord_fixed() +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0),"cm"),
) +
expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))
edges=flare$edges
head(edges)
## from to
## 1 flare.analytics.cluster flare.analytics.cluster.AgglomerativeCluster
## 2 flare.analytics.cluster flare.analytics.cluster.CommunityStructure
## 3 flare.analytics.cluster flare.analytics.cluster.HierarchicalCluster
## 4 flare.analytics.cluster flare.analytics.cluster.MergeEdge
## 5 flare.analytics.graph flare.analytics.graph.BetweennessCentrality
## 6 flare.analytics.graph flare.analytics.graph.LinkDistance
vertices=flare$vertices%>%arrange(name)%>%mutate(name=factor(name,name))
head(vertices)
## name size shortName
## 1 flare 0 flare
## 2 flare.analytics 0 analytics
## 3 flare.analytics.cluster 0 cluster
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster
## 5 flare.analytics.cluster.CommunityStructure 3812 CommunityStructure
## 6 flare.analytics.cluster.HierarchicalCluster 6714 HierarchicalCluster
#Preparation to draw labels properly:
vertices$id=NA
myleaves=which(is.na(match(vertices$name,edges$from)))
nleaves=length(myleaves)
vertices$id[myleaves]=seq(1:nleaves)
vertices$angle=90-360*vertices$id/nleaves
vertices$hjust=ifelse(vertices$angle < -90, 1,0)
vertices$angle=ifelse(vertices$angle < -90,vertices$angle+180,vertices$angle)
head(vertices)
## name size shortName id
## 1 flare 0 flare NA
## 2 flare.analytics 0 analytics NA
## 3 flare.analytics.cluster 0 cluster NA
## 4 flare.analytics.cluster.AgglomerativeCluster 3938 AgglomerativeCluster 1
## 5 flare.analytics.cluster.CommunityStructure 3812 CommunityStructure 2
## 6 flare.analytics.cluster.HierarchicalCluster 6714 HierarchicalCluster 3
## angle hjust
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 88.36364 0
## 5 86.72727 0
## 6 85.09091 0
# Build a network object from this dataset:
mygraph=graph_from_data_frame(edges,vertices=vertices)
The clustering.
# Basic dendrogram
ggraph(mygraph,layout='dendrogram',circular=TRUE)+
geom_edge_link(size=0.4,alpha=0.1)+
geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle=angle-90,hjust=hjust),size=1.5,alpha=0.5) +
coord_fixed() +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0),"cm"),
) +
expand_limits(x=c(-1.2, 1.2),y=c(-1.2, 1.2))
## Warning: Ignoring unknown parameters: edge_size
The network with clustering information.
connections=flare$imports
# The connection object must refer to the ids of the leaves:
from=match(connections$from,vertices$name)
to=match(connections$to,vertices$name)
# Make the plot
ggraph(mygraph,layout='dendrogram',circular=TRUE)+
geom_conn_bundle(data=get_con(from=from,to=to),alpha= 0.1, colour="#69b3a2") +
geom_node_text(aes(x=x*1.01,y=y*1.01,filter=leaf,label=shortName,angle = angle-90,hjust=hjust),size=1.5,alpha=1) +
coord_fixed()+
theme_void()+
theme(
legend.position="none",
plot.margin=unit(c(0,0,0,0),"cm"),
) +
expand_limits(x = c(-1.2, 1.2), y = c(-1.2, 1.2))
In arc diagrams, nodes are displayed along a single axis and links are represented with arcs.
Compared to the 2-d visualization presented above, it displays the label of each node clearly, which is often impossible in 2d structure. Another merit for using Arc diagram is that it can utilize the clustering information if the node order is chosed wisely.
ggraph(book_graphNetwork,layout="linear")+
geom_edge_arc(aes(width=weight/60,color=factor(from)),alpha=0.6,show.legend=F) +
geom_node_text(aes(label=name),repel=F, size=4.5,angle = 320)+
guides(fill=F)+
theme_graph()
An extension to arc diagrams is the hive plot, where instead of the nodes being laid out along a single one-dimensional axis they are laid out along multiple axes. This can help reveal more complex clusters (if the nodes represent connected people, imagine for example laying out nodes along axes of both “income” and “enthicity”).
Here’s an example of a hive plot on the pride and prejudice network:
graph <- as_tbl_graph(book_graphNetwork) %>%
mutate(degree = centrality_degree())
age=c("old","young","young","young","young",
"old","kid","kid","young","young","young",
"young","old","young","young","old","old")
ggraph(graph, 'hive', axis = age) +
geom_edge_hive(colour = 12,label_colour = 2) +
geom_axis_hive(aes(colour = age), size = 2, label = FALSE) +
geom_node_label(aes(label=name),repel=F, size=2.5) +
coord_fixed()
And it is a particularly useful way of visualizing graphs with many nodes and edges that look like a dense “hairball” using traditional graph layouts. Thus, we can plot the hive diagram on the highschool dataset based on their number of friends:
highschool_graph <- as_tbl_graph(highschool) %>%
mutate(degree = centrality_degree())
highschool_graph <- highschool_graph %>%
mutate(friends = ifelse(
centrality_degree(mode = 'in') < 5, 'few',
ifelse(centrality_degree(mode = 'in') >= 15, 'many', 'medium')
))
ggraph(highschool_graph, 'hive', axis = friends, sort.by = degree) +
geom_edge_hive(aes(colour = factor(year))) +
geom_axis_hive(aes(colour = friends), size = 2, label = FALSE) +
coord_fixed()
Please note that the inter-connection between node on each axis are ignored in the visualization.
Flow diagram is a collective term for a diagram representing a flow or set of dynamic relationships in a system. Some of those flow diagram are actually very helpful in visualizing the network or network-like dataset.
We will use the 1960 - 1970 population migration data, which displays the number of people migrating from one country to another. Data used comes from this publication: https://onlinelibrary.wiley.com/doi/abs/10.1111/imre.12327
Sankey diagrams are a type of flow diagram in which the width of the arrows is proportional to the flow rate.
Sankey diagrams can also visualize the energy accounts, material flow accounts on a regional or national level, and cost breakdowns. Itemphasize the major transfers or flows within a system. They help locate the most important contributions to a flow. They often show conserved quantities within defined system boundaries.
We will use our old friend, R package networkD3 here.
# Load dataset from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/13_AdjacencyDirectedWeighted.csv", header=TRUE)
# Package
library(networkD3)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::as_data_frame() masks tibble::as_data_frame(), igraph::as_data_frame()
## x purrr::compose() masks igraph::compose()
## x tidyr::crossing() masks igraph::crossing()
## x dplyr::filter() masks plotly::filter(), tidygraph::filter(), stats::filter()
## x dplyr::groups() masks plotly::groups(), tidygraph::groups(), igraph::groups()
## x dplyr::lag() masks stats::lag()
## x purrr::simplify() masks igraph::simplify()
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname) %>%
filter(value > 0)
colnames(data_long) <- c("source", "target", "value")
data_long$target <- paste(data_long$target, " ", sep="")
# From these flows we need to create a node data frame: it lists every entities involved in the flow
nodes <- data.frame(name=c(as.character(data_long$source), as.character(data_long$target)) %>% unique())
# With networkD3, connection must be provided using id, not using real name like in the links dataframe.. So we need to reformat it.
data_long$IDsource=match(data_long$source, nodes$name)-1
data_long$IDtarget=match(data_long$target, nodes$name)-1
# prepare colour scale
ColourScal ='d3.scaleOrdinal() .range(["#FDE725FF","#B4DE2CFF","#6DCD59FF","#35B779FF","#1F9E89FF","#26828EFF","#31688EFF","#3E4A89FF","#482878FF","#440154FF"])'
# Make the Network
sankeyNetwork(Links = data_long, Nodes = nodes,
Source = "IDsource", Target = "IDtarget",
Value = "value", NodeID = "name",
sinksRight=FALSE, colourScale=ColourScal, nodeWidth=40, fontSize=13, nodePadding=20)
For non-interactive Sankey plot, one can use R package riverplot.
A chord diagram is a graphical method of displaying the inter-relationships between data in a matrix. The data are arranged radially around a circle with the relationships between the data points typically drawn as arcs connecting the data.
The format can be aesthetically pleasing, making it a popular choice in the world of data visualization.
The primary use of chord diagrams is to show the flows or connections between several entities (called nodes). Each entity is represented by a fragment (often colored or pattered) along the circumference of the circle. Arcs are drawn between entities to show flows (and exchanges in economics). The thickness of the arc is proportional to the significance of the flow.
We will use the R package circlize here:
library(circlize)
## ========================================
## circlize version 0.4.12
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
##
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
## in R. Bioinformatics 2014.
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(circlize))
## ========================================
##
## Attaching package: 'circlize'
## The following object is masked from 'package:igraph':
##
## degree
# short names
colnames(data) <- c("Africa", "East Asia", "Europe", "Latin Ame.", "North Ame.", "Oceania", "South Asia", "South East Asia", "Soviet Union", "West.Asia")
rownames(data) <- colnames(data)
# I need a long format
data_long <- data %>%
rownames_to_column %>%
gather(key = 'key', value = 'value', -rowname)
# parameters
circos.clear()
circos.par(start.degree = 90, gap.degree = 4, track.margin = c(-0.1, 0.1), points.overflow.warning = FALSE)
par(mar = rep(0, 4))
# color palette
mycolor <- viridis(10, alpha = 1, begin = 0, end = 1, option = "D")
mycolor <- mycolor[sample(1:10)]
# Base plot
chordDiagram(
x = data_long,
grid.col = mycolor,
transparency = 0.25,
directional = 1,
direction.type = c("arrows", "diffHeight"),
diffHeight = -0.04,
annotationTrack = "grid",
annotationTrackHeight = c(0.05, 0.1),
link.arr.type = "big.arrow",
link.sort = TRUE,
link.largest.ontop = TRUE)
## Note: The second link end is drawn out of sector 'Europe'.
## Note: The second link end is drawn out of sector 'South East Asia'.
# Add text and axis
circos.trackPlotRegion(
track.index = 1,
bg.border = NA,
panel.fun = function(x, y) {
xlim = get.cell.meta.data("xlim")
sector.index = get.cell.meta.data("sector.index")
# Add names to the sector.
circos.text(
x = mean(xlim),
y = 3.2,
labels = sector.index,
facing = "bending",
cex = 0.8
)
# Add graduation on axis
circos.axis(
h = "top",
major.at = seq(from = 0, to = xlim[2], by = ifelse(test = xlim[2]>10, yes = 2, no = 1)),
minor.ticks = 1,
major.tick.length = 0.5,
labels.niceFacing = FALSE)
}
)
Package
library(ggraph)
library(igraph)
library(networkD3)
course<-read.csv("data/course.csv")
networkfriend<-read.csv("data/network.csv")
networkfriend<-as.data.frame(networkfriend)
Now we try the network.csv dataset. It is about the relationships among a group of people.
#rownames
rownames(networkfriend)<-colnames(networkfriend)
#transfer to adjacency matrix
networkfriend<-as.matrix(networkfriend)
head(networkfriend)
## Joe Judy Tom Sarah Alice Emily Bob Tim Peter Helen Jack Kate Jane Jim
## Joe 0 1 1 0 1 1 0 1 0 0 1 0 0 1
## Judy 1 0 0 1 0 0 1 0 0 0 0 0 0 0
## Tom 1 0 0 0 1 0 0 0 0 0 0 0 0 0
## Sarah 0 1 0 0 0 0 0 0 1 1 0 0 0 0
## Alice 1 0 1 0 0 0 0 0 0 0 0 0 0 0
## Emily 1 0 0 0 0 0 0 0 0 1 0 0 0 0
## Alma Amy Anna Bella Cherry Daisy Ella Grace Jenny Jessica Kitty Linda
## Joe 0 0 0 1 0 0 0 1 0 0 1 0
## Judy 0 0 0 0 0 0 0 0 0 0 0 0
## Tom 0 1 0 0 0 1 0 0 0 0 0 0
## Sarah 0 0 0 0 0 0 0 0 0 1 0 1
## Alice 0 0 0 0 0 0 0 0 1 0 0 0
## Emily 0 0 0 0 0 0 1 0 0 0 0 0
## Lisa Mary Tina Ben Andrew Bill Carl David Frank Jason John Kim Mark Nick
## Joe 0 0 0 1 0 0 0 0 1 0 1 0 0 1
## Judy 0 0 0 0 0 0 0 0 0 1 0 0 0 0
## Tom 0 0 1 0 0 0 0 1 0 0 0 0 1 0
## Sarah 0 0 0 0 1 0 0 0 0 1 0 0 0 1
## Alice 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Emily 0 1 0 0 0 0 0 0 1 1 0 0 0 0
## Steven Simon Robert Neil
## Joe 0 0 1 0
## Judy 1 0 0 0
## Tom 0 1 0 1
## Sarah 1 0 0 0
## Alice 0 0 0 0
## Emily 0 0 0 0
networkfriend=graph_from_adjacency_matrix(networkfriend, weighted=TRUE)
# Plot it
# Make the graph
ggraph(networkfriend, layout="fr") +
geom_edge_link(edge_colour="black", edge_alpha=0.2) +
geom_node_point( color="#69b3a2", size=3) +
#scale_edge_width(range=c(1,3)) +
geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(rep(1,4), "cm")
)
ggraph(networkfriend, layout="kk") +
geom_edge_link(edge_colour="black",edge_alpha=0.2) +
geom_node_point(color="#69b3a2",size=3) +
#scale_edge_width(range=c(1,3)) +
geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(rep(1,4), "cm")
)
ggraph(networkfriend, layout="drl") +
geom_edge_link(edge_colour="black",edge_alpha=0.2) +
geom_node_point(color="#69b3a2",size=3) +
#scale_edge_width(range=c(1,3)) +
geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(rep(1,4), "cm")
)
ggraph(networkfriend, layout="igraph",algorithm="randomly") +
geom_edge_link(edge_colour="black",edge_alpha=0.2) +
geom_node_point(color="#69b3a2", size=3) +
#scale_edge_width(range=c(1,3)) +
geom_node_text( aes(label=name), repel = TRUE, size=3, color="#69b3a2") +
theme_void() +
theme(
legend.position="none",
plot.margin=unit(rep(1,4), "cm")
)
“fr” and “kk” each represent a force-directed algorithm. The third one is the result of the random arrangement of nodes. The first two layout algorithms clearly show the topological structure of social relationships. The third one is not desirable, which doesn’t show the importance of node layout. It can be seen from the picture that the participants are mainly Joe’s friends. In addition, there are many people who know David, Emily, Nick, Judy, Sarah, Bob, Simon, Steven, Tom, Alma, Jane, Linda, Anna, Robert, John only met one of them.
Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically-pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.
library(igraph)
library(ggraph)
library(networkD3)
Preparation We are using the the novel “长安十二时辰” as an example for quantitative relationship.
Use igraph to draw the relationship plot
edge <- read.table('data/12edge.txt',header=T,fileEncoding="UTF-16") # load edge
net <- graph_from_data_frame(d=edge, directed=FALSE) # create the network
plot(net)
plot(net,
layout=layout_with_fr) # Fruchterman-Reingold layout
plot(net,
layout=layout_nicely) # Optimal Layout
E(net)$weight <- edge$relation #create the weight
plot(net,
layout=layout_nicely,
vertex.shape="none") #delete node
plot(net,
layout=layout_nicely,
vertex.shape="none",
edge.width=E(net)$weight) #use weight to represent the edge
plot(net,
layout=layout_nicely,
vertex.shape="none",
edge.width=E(net)$weight/20) #adjust the weight
plot(net,
layout=layout_nicely,
vertex.shape="none",
edge.width=E(net)$weight/20,
vertex.label.cex=0.4) #addjust the size of the label
plot(net,
layout=layout_nicely,
vertex.shape="none",
edge.width=E(net)$weight/20,
vertex.label.cex=0.4,
vertex.label.color='black') #change the color of the label
net <- graph_from_data_frame(d=edge, directed=FALSE) # undirected
ggraph(net) +
geom_edge_link() + # add edge
geom_node_point() # add node
## Using `stress` as default layout
ggraph(net, layout='graphopt')+
geom_node_text(aes(label = name)) # adjust the layout and add label to avoid overlap
ggraph(net, layout='graphopt')+
geom_node_label(aes(label = name)) #adjust the layout
ggraph(net, layout='graphopt')+
geom_node_label(aes(label = name), repel=TRUE) # adjust layout to avoid the overlap
ggraph(net, layout='kk')+
geom_node_label(aes(label = name)) ## use "kk" to adjust the layout
ggraph(net, layout='kk')+
geom_node_label(aes(label = name)) +
geom_edge_link() # add edge
ggraph(net, layout='kk')+
geom_node_label(aes(label = name)) +
geom_edge_link(alpha = 0.3, colour = '#377EB8') #Change the transparency and color of the edges
ggraph(net, layout='kk')+
geom_node_label(aes(label = name)) +
geom_edge_link(alpha = 0.3, colour = '377EB8') +
theme_graph(background = 'white')
E(net)$weight <- edge$relation
ggraph(net, layout='kk')+
geom_node_label(aes(label = name)) +
geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') +
theme_graph(background = 'white')# set relation as the width of edge
E(net)$weight <- edge$relation
ggraph(net, layout='graphopt')+
geom_node_label(aes(label = name)) +
geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') +
theme_graph(background = 'white') +
theme(legend.position = "none")#addjust the layout
ggraph(net, layout='graphopt')+
geom_node_label(aes(label = name)) +
geom_edge_arc(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') +
theme_graph(background = 'white') +
theme(legend.position = "none")# chage the edge to ARC
ggraph(net, layout='linear', circular=TRUE)+
geom_node_label(aes(label = name)) +
geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') +
theme_graph(background = 'white') +
theme(legend.position = "none")
ggraph(net, layout='linear', circular=TRUE)+
geom_node_label(aes(label = name)) +
geom_edge_arc(aes(edge_width = E(net)$weight), alpha = 0.25, colour = '377EB8') +
theme_graph(background = 'white') +
theme(legend.position = "none")
#lab_size=as.vector(degree(net))
#lab_size=degree(net, normalized = TRUE) * 10 + 2
lab_size=c(7.0, 9.0, 8.5, 7.0, 9.0, 6.0, 12.5, 6.0, 11.5, 10.0, 5.0, 8.5, 8.0, 9.5, 6.5, 10.5, 7.5, 9.5, 8.5, 7.0, 3.0)
ggraph(net, layout='graphopt')+
geom_node_label(aes(label = name), size=lab_size) +
geom_edge_link(aes(edge_width = E(net)$weight), alpha = 0.3, colour = '377EB8') +
theme_graph(background = 'white') +
theme(legend.position = "none")# Set the degree of the node to the size of the node
net <- graph_from_data_frame(d=edge, directed=FALSE) #undirected
netD3 <- igraph_to_networkD3(net) # transfer the igraph to networkD3
netD3$nodes$group <- '1' # add group in the dataframe
#netD3$nodes$degree <- degree(net) # add degree in the data frame
netD3$nodes$degree <- c(10, 14, 13, 10, 14, 8, 21, 8, 19, 16, 6, 13, 12, 15, 9, 17, 11, 15, 13, 10, 2)
forceNetwork(Links = netD3$links, Nodes = netD3$nodes,
Source = 'source', Target = 'target',
NodeID = 'name', Group = 'group',
Value = 'value',
Nodesize = 'degree',
#fontFamily = '黑体',
charge = -100,
zoom = TRUE,
bounded = FALSE,
opacityNoHover = TRUE)
1.1 Social Network and 2-d visulizations
We have demonstrated some visualizations on a text network constructed by co-occurrence matrix in the section of text visualization, where each node is a main character in the book of Pride and Prejudice and each edge represents the co-occurrence of two characters in the same sentence. As this network represents the relationships among a group of people, it is a prefect example of social network.
Let’s re-cap that network quickly.
1.1.1 load the co-occurrence matrix as a network
In the above co-occurrence matrix, the numbers represents the number of sentences that contain the word in column and the word in row at the same time. For example, there are 74 sentences that contain both elizabeth and jane at the same time.
Note that this is only a fraction of the matrix.
Now we treat each character as a node and connect them by thier co-occurrence as just in the text visualization section. The following two graphs should be familiar to you.
Or we can use ggraph for fine tuned visualization
1.1.2 Layout Matters!
The layout of a network refers to the way how do we present the network, eg, how do we decide the location of nodes and the distance between them?
We will use the Highschool dataset included in the package ggraph. The dataset shows the change of friendship among a group of high school students from 1957 to 1958. The dataset is stores as edge list as follow:
1.1.2.1 layout igraph randomly
Let’s start with a random layout.
1.1.2.2 layout fr
The layout Fruchterman-Reingold is a force-directed layout algorithm. The idea of a force directed layout algorithm is to consider a force between any two nodes. In this algorithm, the nodes are represented by steel rings and the edges are springs between them. The attractive force is analogous to the spring force and the repulsive force is analogous to the electrical force. The basic idea is to minimize the energy of the system by moving the nodes and changing the forces between them.
This layout is usually useful for visualizing very large undirected networks.
1.1.2.3 layout kk
layout kamada kawai is another force based algorithm that performs very well for connected graphs, but it gives poor results for unconnected ones. Due to
1.1.2.4 layout Drl
DrL is another force-directed graph layout toolbox focused on real-world large-scale graphs, developed by Shawn Martin and colleagues at Sandia National Laboratories.
1.1.2.5 layout sphere
layout sphere places the vertices (approximately) uniformly on the surface of a sphere, this is thus a 3d layout. The benefit of using this here is very clear. The location of each student is relatively fixed so that we can compare easily between two years.
1.1.2.6 combined plot
Or we can directly plot those two network in one plot as the nodes are the same. And this time, we will plt four layouts and try to decide which one make more sense by just comparing them.
Although there is no such thing as “the best layout algorithm” as algorithms have been optimized for different scenarios. Experiment with them and choose the one that is “salty” is sometime helpful!
There are other layout avaliable in ggraph, more detail can be found here: https://www.rdocumentation.org/packages/igraph/versions/0.7.1/topics/layout
1.1.3 Dependecy network (Course Structure example)
In the above examples, nodes are of the same type all the time. However, network can be constructed in many ways! For example, we can build a network to represent the dependecy between different level of nodes. This is very similar to a cluter data.
The Course Structure data (course.csv) describe the structure of courses included the Master of Statistics Programs (Mathematical Statistics, Economic Statistics, and Epidemiology and Health Statistics) from Renmin Univeristy of China. There are two type of nodes, some of them are courses and the others are programs. They are connected if a course belong to a program.
We visualize this network as an iterative network so that the clutering pattern is highlighted if you hover on the program node. At the same time, the overlapping course between programs are also clear.
If you want to color the nodes, you will have to use a more complicated function: forceNetwork. forceNetwork provides more aruguemnts so that you can fine tuning your plot.