In this chapter, we continue to discuss the visualization basics. The following R packages are needed for running the examples in this chapter.
library(tidyverse)
library(grid)
library(gridExtra)
library(RColorBrewer)
So far, we have focused on scatter plot which mostly visualize quantitative variable. If we would like to visualize qualitative variables, then we need the barplot.
We use the barplot to show the frequency/distribution of different categories and the composition of the qualitative variable. For example of mpg data set, suppose we are interested in the number of university in different regions in US, then we could use the barplot to visualize this information.
Note that it is always better to have meaningful x or y axis in the barplot. For example, we could sort all the bars according to the frequencies.
college = read.csv("data/college.csv", header = TRUE)
g1 = ggplot(data=college) +
geom_bar(aes(x=region))
g2 = ggplot(data=college) +
geom_bar(aes(y=fct_infreq(region))) +
scale_x_continuous(expand = c(0, 0), limits = c(0, 500)) +
ylab("region") +
xlab("Number of observations")
grid.arrange(g1,g2,ncol=3)
The code above first counts the number of universities in each
region, and then plot their frequencies. Alternatively, we could
manually calculate the frequencies ourselves and directly plot them. We
will need to either set stat="identity"
in
geom_bar()
, or use geom_col()
. Note that
ordering the axis is slightly different in geom_col()
than
in geom_bar()
.
region_freq = college %>% group_by(region) %>% summarise(count=n())
region_freq # frequencies of types
## # A tibble: 4 x 2
## region count
## <chr> <int>
## 1 Midwest 353
## 2 Northeast 299
## 3 South 459
## 4 West 158
Please try the following code on your computer. They all produce the same results as before.
g1 = ggplot(data=region_freq) +
geom_col(aes(y=region, x = count))
g2 = ggplot(data=region_freq) +
geom_col(aes(y=fct_reorder(region, -count), x = count))
g3 = ggplot(data=region_freq) +
geom_col(aes(y=fct_infreq(region), x = count))
g4 = ggplot(data=region_freq) +
geom_bar(aes(y=region, x = count), stat="identity")
g5 = ggplot(data=region_freq) +
geom_bar(aes(y=fct_reorder(region, -count), x = count), stat="identity") + ylab("region")
g6 = ggplot(data=college) +
geom_bar(aes(y=fct_infreq(region))) + ylab("region")
grid.arrange(g1,g2,g3,g4,g5,g6, ncol=3)
When visualizing the amount across different categories, barplot may not be the most suitable tool. Take a look at the following example. We plot the average SAT score in each state in US using a barplot. Note that we have sort the bars according to the SAT scores. The SAT score are mostly between 800 and 1400 (You get 400 by just putting your name down). When we take the average across all universities in each state, the variation in the SAT score for each state becomes even less. Therefore, we mostly see many long bars of similar lengths. The visualization is accurate, but not informative. We can emphasize on the difference of SAT score by changing the range of y-axis, which is the middle figure. However, the length of the bar is not proportional to SAT which violates one of the most important principal in data visualization. We will discuss this principal in details in the following chapter. For example, it seems that the SAT of North Carolina (NC) is about twice the SAT of West Virginia (WV) since the bar of the former is twice as long as the bar of latter. In this case, we can alternatively use dot plot, which is just scatter plot, but with one axis being discrete data. The dot only indicates the location of the data point.
state_sat_df = college %>%
group_by(state) %>%
summarize(state_sat = mean(sat_avg))
g1 = state_sat_df %>%
ggplot() +
geom_col(aes(y=fct_reorder(state, state_sat), x = state_sat), width = 0.7) +
scale_y_discrete(name = "State") +
xlab("Average SAT")
g2 = state_sat_df %>%
ggplot() +
geom_col(aes(y=fct_reorder(state, state_sat), x = state_sat), width = 0.7) +
scale_y_discrete(name = "State") +
coord_cartesian(xlim = c(950,1200) ) +
scale_x_continuous(expand = c(0,0)) +
xlab("Average SAT") +
theme(axis.ticks.y = element_blank())
g3 = state_sat_df %>%
ggplot() +
geom_point(aes(y=fct_reorder(state, state_sat), x = state_sat), size = 2) +
scale_y_discrete(name = "State") +
coord_cartesian(xlim = c(950,1200) ) +
scale_x_continuous(expand = c(0,0)) +
xlab("Average SAT") +
geom_vline(xintercept = mean(college$sat_avg),
color = "blue",
linetype = "dotted")+
theme(axis.ticks.y = element_blank(),
panel.grid.major.y = element_line(color = "grey",
linetype = "dashed"))
grid.arrange(g1,g2,g3,ncol=3)
The barplot can visualize the distribution of the qualitative
variable, but also can visualize the relationship between two
qualitative variables through its variations. For example, the argument
position
in geom_bar()
function can adjust the
position to show different graphical properties which includes
“identity”, “dodge”, “fill”,and “stack”.
For the college data set example, suppose we want to visualize the
association between the region of the university region
and
the funding type of the university, we can use the following
barplots.
g1=ggplot(data=college) +
geom_bar(aes(x=fct_relevel(region, "South", "Midwest", "Northeast", "West"),
fill = control),
width=0.75) +
scale_x_discrete(name = "Region") +
scale_fill_discrete(name = "")+
theme(legend.position = "top",
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
g2=ggplot(data=college) +
geom_bar(aes(x=region,
fill = control),
position="fill",
width = 0.75) +
scale_x_discrete(name = "Region") +
scale_fill_discrete("")+
scale_y_continuous(name="precent", labels = scales::percent)+
theme(legend.position = "top",
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
g3=ggplot(data=college) +
geom_bar(aes(x=fct_relevel(region, "South", "Midwest", "Northeast", "West"),
fill = control),
position="dodge",
width=0.75) +
scale_x_discrete(name = "Region") +
scale_fill_discrete("") +
theme(legend.position = "top",
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
grid.arrange(g1,g2,g3,ncol=3)
ggplot(data=college) +
geom_bar(aes(x=fct_relevel(region, "South", "Midwest", "Northeast", "West"),
fill = control),
width=0.75) +
scale_x_discrete(name = "Region") +
scale_y_continuous(limits = c(0, 250)) +
facet_wrap(~control)
Population Pyramid
We use Saudi Aradia’s population data to generate the population pyramid as follows.
saudi = read_csv("data/saudi_arabia.csv")
## Rows: 102 Columns: 12
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (4): FIPS, GENC, Country/Area Name, GROUP
## dbl (8): Year, Population, % of Population, Male Population, % of Males, Fem...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
saudi
## # A tibble: 102 x 12
## FIPS GENC `Country/Area Name` Year GROUP Population `% of Population`
## <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 SA SA Saudi Arabia 2023 TOTAL 35939806 100
## 2 SA SA Saudi Arabia 2023 0 494327 1.38
## 3 SA SA Saudi Arabia 2023 1 495889 1.38
## 4 SA SA Saudi Arabia 2023 2 502139 1.40
## 5 SA SA Saudi Arabia 2023 3 511583 1.42
## 6 SA SA Saudi Arabia 2023 4 522930 1.46
## 7 SA SA Saudi Arabia 2023 5 536550 1.49
## 8 SA SA Saudi Arabia 2023 6 551843 1.54
## 9 SA SA Saudi Arabia 2023 7 568419 1.58
## 10 SA SA Saudi Arabia 2023 8 584245 1.63
## # ... with 92 more rows, and 5 more variables: `Male Population` <dbl>,
## # `% of Males` <dbl>, `Female Population` <dbl>, `% of Females` <dbl>,
## # `Sex Ratio` <dbl>
saudi %>%
select(GROUP, `Male Population`, `Female Population`) %>%
rename(age = GROUP,
male = `Male Population`,
female = `Female Population`) %>%
filter(age != c("TOTAL", "100+")) %>%
mutate_at(c("age"), as.numeric) %>%
mutate(age_group = cut(age, breaks = seq(0, 100, 5), right = FALSE)) %>%
pivot_longer(cols = c("male", "female"),
names_to = "gender",
values_to = "pop") %>%
ggplot() +
geom_col(aes(y = age_group,
x = ifelse(gender == "male", pop, -pop),
fill = gender)) +
scale_x_continuous(labels = abs) +
xlab("Population") +
ylab("Age")
We briefly discuss the pie chart, which is used to visualize the composition of a qualitative variable. There are two forms for pie charts - the typical filled circle, or a colored ring. The pie chart uses the angel or the length of the curve to represent the proportion of each category or an unique value. Since these angels or curves are often in different orientations, the comparison across different categories are often difficult. This is also why we do not recommend pie chart. Instead, we should use the barplot which is more efficient.
Let us take a look at the example. Suppose we would like to compare on the number of universities in the five states in the midwest area, including OH, MI, IN, IL, and WI. We generate the pie chart, the ring chart, and the barplot for comparison. As we can see, other than OH which has the most of the universities, it is hard to compare the rest of the states as they angels and ring segments are almost the same. When looking the barplot, it is apparent that WI has the lowest number, while IL is higher than IN and IN is higher than MI. This insight cannot be easily obtained in the pie chart and donute chart, which is why we should avoid using them. There are some remedies for these charts, such as adding the percentage numbers next to the pie. But the visualization is meant to be self-explanatory, efficient and faithfully, adding the text is conflicting to these goals.
g1 = ggplot(filter(college, state %in% c("OH", "MI", "IN", "IL", "WI")),
aes(x = 1, fill = state)) +
geom_bar() +
scale_fill_discrete("State")+
coord_polar(theta = "y") +
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "top")
g2=ggplot(filter(college, state %in% c("OH", "MI", "IN", "IL", "WI")),
aes(x = 1, fill = state)) +
geom_bar(width = 0.8) +
scale_fill_discrete("State")+
coord_polar(theta = "y") +
scale_x_continuous(limits=c(0,1.5)) + # Add a continuous x scale from 0.5 to 1.5
theme(panel.background = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "top")
g3=ggplot(filter(college, state %in% c("OH", "MI", "IN", "IL", "WI")),
aes(x = state, fill=state)) +
scale_fill_discrete("State")+
geom_bar() +
theme(legend.position = "top")
grid.arrange(g1,g2,g3,ncol=3)
So far, we have been using the default the colors in ggplot(). The color is an important element in visualization. In many situations, we would need to customize to improve the visualization. In this chapter, we discuss how to customize the colors.
The colors can be mainly represented/indexed in three ways in R: color names, 3-digit RGB values, and hexadecimal strings.
First, we could refer color by their names, e.g. “red”, “orange”,
“yellow”, “wheat”, “salmon”. R has 657 built in color names, such as
“red”, “cyan”, and “chocolate”. To see the list, type
colors()
. These colors are shown here1.
colors()
## [1] "magenta2" "purple2" "darkseagreen" "indianred"
## [5] "turquoise2" "papayawhip" "lightgoldenrod" "darkgreen"
## [9] "grey27" "darkorchid4" "orangered3" "lightskyblue"
## [13] "moccasin" "lemonchiffon3" "steelblue" "dodgerblue4"
## [17] "grey93" "darkturquoise" "steelblue1" "gray63"
Alternatively, we could refer each color by 3-digit RGB values, e.g.,
(255, 135, 0), which represents the proportion of red, green, and blue
in the color. Each color in R can be represented by the proportion of
red, green, and blue using a numeric vector of three numbers ranging
from 0 to 255, which is called the RGB color system2. Therefore, there are
in total 256*256*256=16,777,216
possible colors in the RGB
color system.
rgb(255, 165, 0, maxColorValue = 255)
## [1] "#FFA500"
rgb(1, 0.5, 0)
## [1] "#FF8000"
rgb(1, 0.5, 0, alpha = 0.5) # alpha represents the transparency level.
## [1] "#FF800080"
Lastly, we could refer colors by their hexadecimal
strings, e.g., “#FF0000”, “#FFA500”, “#FFFF00”. R internally
uses hexadecimal (or hex) to represent colors. Hexadecimal is a base-16
number system used to describe color. Red, green, and blue are each
represented by two characters (#rrggbb). Each character has 16 possible
symbols: 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. For example, white RGB code is
rbg(255,255,255)
which can be represented as
#FFFFFF = 255*65536+255*256+255
and gray RGB code is
rgb(128,128,128)
which can be represented as
#808080 = 128*65536+128*256+128
and brow is
(165,42,42)
or #A52A2A
.
Here is an example where we use all three ways to refer to colors in a scatter plot.
d = data.frame(x = c(1,2,3,4,5), y = c(1,2,3,4,5), class = c("AA", "AA", "BB", "CC", "DD"))
ggplot(d) + geom_point(aes(x,y,color=class),size=4)+
scale_color_manual(breaks = c("AA", "BB", "CC", "DD"),
values = c(rgb(250, 250, 10, maxColorValue=255),
rgb(0.1, .2, 0.7, alpha=0.3),
"#A52A2A",
"salmon"))
Choosing colors manually is too time consuming especially when you
have too many categories. We can rely on the existing color palettes in
R, which stores a sequence of pre-specified colors that are suitable for
representing continuous and discrete data. There are two main color
palettes in R which are the R packages RcolorBrewer
and
viridis
, displayed as follows.
The RColorBrewer
package contain following palettes. As
we can see, there are three types of palettes: sequential, discrete, and
diverging. The sequential color palettes can be used
for continuous variables. The discrete
color palettes can be used for discrete variables. The
diverging color palettes can be used for
continuous variables who have both positive and
negative values, such as correlation. To use the
RColorBrewer
palettes, we use the following three functions
depending on the type of variable:
scale_colour_distiller()
is a continuous color
scale.scale_colour_brewer()
is a discrete color scale.scale_colour_fermenter()
is a binned color scale.Meanwhile, the viridis
package contains only four
palettes shown below. All these four palettes can be applied to
both continuous and discrete variables. To use the
viridis
palettes, we use the following three functions
depending on the type of variable:
scale_fill_viridis_c()
is a continuous color
scale.scale_fill_viridis_d()
is a discrete color scale.scale_fill_viridis_b()
is a binned color scale.For quantitative/continuous variables, we can use the following functions to customize colors.
virids
and
RColorBrewer
.
scale_color_viridis_c()
for virids
palettes such as viridis
, magma
, and etc.scale_color_distiller()
for RColorBrewer
palettes such as YlOrRd
, YlorBr
, and etc.scale_color_gradient()
for interpolating two
colors.scale_color_gradient2()
for interpolating three
colorsscale_color_gradientn()
for interpolating more than
three colors.Back to the mpg data example, suppose we would like to use various
colors for the symbol. We first examples of viridis
color
palette, RColorBrewer
color palette, and creating a color
sequence by interpolating colors.
library(RColorBrewer)
college = read.csv("data/college.csv", header = TRUE)
g = ggplot(college) + geom_point(aes(x=sat_avg, y=admission_rate, color = admission_rate),size=2)
g1 = g + scale_color_continuous(name = "Adm Rt")
g2 = g + scale_color_viridis_c(name = "Adm Rt")
g3 = g + scale_color_viridis_c(name = "Adm Rt", option = "magma") # try option = "plasma" "inferno" or "cividis"
g4 = g + scale_color_distiller(name = "Adm Rt", palette = "Spectral")
g5 = g + scale_color_distiller(name = "Adm Rt", palette = "Greys")
g6 = g + scale_color_gradient(name = "Adm Rt", low = "black", high = "green")
g7 = g + scale_color_gradient2(name = "Adm Rt", low = "blue", mid = "white", high = "yellow", midpoint = 0.5)
g8 = g + scale_color_gradientn(name = "Adm Rt", colors = c("black", "red", "pink", "blue", "green"))
g9 = g + scale_color_gradientn(name = "Adm Rt", colors = colorspace::diverge_hcl(7))
grid.arrange(g1,g2,g3,g4,g5,g6,g7,g8,g9,ncol=3)
We usually use the diverging colors to represents numerical values that are both positive and negative, for example, correlation.
library(tidyverse)
college = read.csv("data/college.csv", header = TRUE)
cor_mat = college %>%
select(admission_rate, sat_avg, undergrads, tuition, faculty_salary_avg, loan_default_rate, median_debt) %>%
complete.cases() %>%
cor()
cor_mat %>%
as_tibble() %>%
mutate( name = rownames(cor_mat))
mutate(variable = )
pivot_longer(cols = admission_rate:median_debt,
names_to = )
For qualitative/discrete variables, we can use the following functions to customize colors.
virids
and
RColorBrewer
.
scale_color_viridis_d()
for virids
palettes such as viridis
, magma
, and etc.scale_color_brewer()
for RColorBrewer
palettes such as Set1
, Set2
, and etc.scale_color_manual()
for manually assigning colors to
levels of discrete variables.Back to the mpg data example, suppose we would like to use various
colors of the symbols to represent the drive train type. We first show
examples of viridis
and RColorBrewer
color
palettes. We also customize colors for discrete variables, which we
usually use scale_color_manual()
.
g <- college %>%
filter(city %in% c("New York", "Los Angeles", "Cincinnati", "Chicago")) %>%
ggplot() +
geom_point(aes(x=sat_avg, y=tuition, color = city), size=2)
g1 = g + scale_color_viridis_d()
g2 = g + scale_color_brewer(palette = "Set3")
g3 = g + scale_color_manual(breaks = c("New York", "Los Angeles", "Cincinnati", "Chicago"),
values = c("red", "blue", "yellow", "pink"))
grid.arrange(g1,g2,g3,ncol=3)
Note that we can set color in a similar way for other type of visualizations. Here are some example.
g1=ggplot(filter(college, state %in% c("OH", "MI", "IN", "IL", "WI")),
aes(x = state, fill=state)) +
geom_bar() +
scale_fill_viridis_d(option = "magma")
data("faithfuld")
erupt <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
geom_raster() + scale_x_continuous(NULL, expand = c(0, 0)) + scale_y_continuous(NULL, expand = c(0, 0)) +
theme(legend.position = "none")
g2 = erupt + scale_fill_viridis_c(option = "magma")
grid.arrange(g1,g2,ncol=2)
Note that scale_color_continuous()
is equivalent to
scale_color_gradient()
. In addition,
scale_color_discrete()
is equivalent to
scale_color_hue()
.
In ggplot2, there are many functions to adjust the colors. We mostly focus on the following functions.
scale_color_brewer()
: for qualitative variable mapped
to color, use R package RColorBrewer
’s palatte.scale_color_distiller()
: for quantitative variable
mapped to color, use R package RColorBrewer
’s palatte.scale_color_viridis_d()
: for qualitative variable
mapped to color, use viridis
color palette.scale_color_viridis_c()
: for quantitative variable
mapped to color, use viridis
color palette.scale_color_gradient()
: for quantitative variable
mapped to color, interpolate to two colors to get a palette
(low-high).scale_color_manual()
: for qualitative variable mapped
to color, manually specify the color for each level.The first two are for RColorBrewer
color palette. The
second two are for viridis
color palette. The last two are
the most flexible functions: scale_colour_gradient()
and
scale_colour_manual()
for continuous and discrete
variables, respectively. Note that we also have another six functions
for fill
such as scale_fill_brewer()
and
etc.
For the aesthetic dimension of color
, a complete list of
scale functions are below. For the aesthetic dimension of
fill
, a similar set of functions can be obtained by
replacing *_color_*()
with *_fill_*()
. Their
functions are the same.
scale_color_brewer()
: for qualitative variable mapped
to color, use R package RColorBrewer
’s palatte.scale_color_distiller()
: for quantitative variable
mapped to color, use R package RColorBrewer
’s palatte.scale_color_fermenter()
: for binned variable mapped to
color, use R package RColorBrewer
’s palatte.scale_color_continuous()
: default to
scale_color_gradient()
.scale_color_binned()
: default to
scale_color_steps()
.scale_color_discrete()
: default to
scale_color_hue()
/scale_color_brewer()
.scale_color_gradient()
: for quantitative variable
mapped to color, interpolate to two colors to get a palette
(low-high).scale_color_gradient2()
: for quantitative variable
mapped to color, interpolate to three colors to get a palette
(low-mid-high).scale_color_gradientn()
: for quantitative variable
mapped to color, interpolate to n colors to get a palette.scale_color_grey()
: for quantitative variable mapped to
color, interpolate to black and white to get a palette.scale_color_hue()
: for qualitative variable mapped to
color, not colour-blind safe palette.scale_color_identity()
: for qualitative variable mapped
to color, this variable has to already contain color as values.scale_color_manual()
: for qualitative variable mapped
to color, manually specify the color for each level.scale_color_steps()
: for binned variable mapped to
color, interpolate to two colors to get a palette (low-high).scale_color_steps2()
: for binned variable mapped to
color, interpolate to three colors to get a palette (low-mid-high).scale_color_stepsn()
: for binned variable mapped to
color, interpolate to n colors to get a palette.scale_color_viridis_d()
: for qualitative variable
mapped to color, use viridis
color palette.scale_color_viridis_c()
: for quantitative variable
mapped to color, use viridis
color palette.scale_color_viridis_b()
: for binned variable mapped to
color, use viridis
color palette.The color is an incredibly complex topic. We only scratch the surface of this issue. Some additional resources on colors are photopea for color coding3, Adobe color wheel4, color pallete5, colorbrewer6, colororacle7, simulator for colorblind8.
To enhance the visualization, we can display multiple figures side by
side or in a grid for better comparison. In order to set up the layout,
we use the R packages grid
and gridExtra
.
The grid
package provides a low-level graphics system to
access the graphics facilities in R. The gridExtra
package
provides a number of user-level functions to work with grid
package and to arrange multiple figures on a page. More specifically, we
use grid.arrange()
functions to set up the layout. Here is
an example.
g1 = ggplot(college, aes(sat_avg, tuition)) +
geom_point(size=0.1)
g2 = ggplot(college, aes(faculty_salary_avg, tuition))+
geom_point(size=0.1)
g3 = ggplot(college, aes(loan_default_rate, tuition))+
geom_point(size=0.1)
g4 = ggplot(college, aes(undergrads, tuition))+
geom_point(size=0.1)
g5 = ggplot(college, aes(admission_rate, tuition))+
geom_point(size=0.1)
g6 = ggplot(college, aes(median_debt, tuition))+
geom_point(size=0.1)
library(grid)
library(gridExtra)
plots<-list(g1,g2,g3,g4,g5,g6)#put 6 plots in one list
vp <- viewport(width =0.6, height =1) #create a viewpoint whose width is 0.6 and height is 1
grid.arrange(grobs = plots, ncol = 2,vp=vp)
## Warning: Removed 2 rows containing missing values (`geom_point()`).
We can display different types of visualization together too. We draw histogram, stack density plot, and boxplot. Use more complex layout with the customized width and height of each panel, we use the following code.
p1 = ggplot(college, aes(x=tuition)) + geom_histogram() +
labs(title="Histogram of tuition")
p2 = ggplot(college, aes(x=tuition, y=..density.., fill=control)) +
geom_density(position="stack")+
labs(title="PDF of tuition",x="Tuition")
p3 = ggplot(college, aes(y=tuition, x=control,fill=control))+
geom_boxplot(outlier.size=.3)+
labs(title="Boxplot of tuition")
lay1 = rbind(c(1, 1),
c(2, 3)) #matrix of layout
library(knitr)
knitr::kable(lay1)
1 | 1 |
2 | 3 |
plots1=list(p1,p2,p3)
grid.arrange(grobs = plots1,
layout_matrix = lay1, # matrix of layout is lay1
widths = c(2, 1),heights = c(0.5,1),
# widths 2:1, heights 1:2
# try delete this row and see what happens
top="Distribution of tuition")
##
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## i Please use `after_stat(density)` instead.
To adjust the margins of the plot, we specify the arguments in the theme layer.
ggplot(college,aes(x=sat_avg, y=tuition))+
geom_point()+
theme(plot.margin = unit(c(3,3,3,3), "cm")) # the 4 margins of the plot are 2 cm
Finally, to save the plot, we use the ggsave
and
arrangeGrob
functions.
m<-arrangeGrob(grobs = plots1, layout_matrix = lay1,
widths = c(2, 1), heights = c(1, 0.5),
top="Distribution of Tuition")
ggsave(file="my_fig.png", m, width = 25, height = 20)