Code for a brief introduction to ggplot2

Today we are discussing ggplot2. Below is code that we talked through, showing the basic syntax and approach for ggpplot2.

install.packages("ggplot2") # Install the ggplot2 library.
library("ggplot2") # Load the ggplot2 library.

colnames(midwest) # See the column names of the "midwest" dataset

# From
# To understand ggplot, you need to ask yourself, what are the fundamental parts of every data graph? They are:
# Aesthetics - these are the roles that the variables play in each graph. A variable may control where points appear, the color or shape of a point, the height of a bar and so on.
# Geoms - these are the geometric objects. Do you need bars, points, lines?
# Statistics - these are the functions like linear regression you might need to draw a line.
# Scales - these are legends that show things like circular symbols represent females while circles represent males.
# Facets - these are the groups in your data. Faceting by gender would cause the graph to repeat for the two genders.
# Set up the basic plot:
our_plot <- ggplot(
midwest, # The dataset
aes(x = percollege, y = percbelowpoverty)

# From the basic plot, make a scatterplot:
our_scatterplot <- our_plot + geom_point()

# Let's add a linear regression line to that ("lm" is R's command for running a basic "linear model"):
our_scatterplot + geom_smooth(method = lm)

# Let's take away the background on that:
our_scatterplot + geom_smooth(method = lm) + theme_classic()

# Let's highlight Ohio and Wisconsin's data:
our_scatterplot + theme_classic() +
geom_point(data = midwest[
midwest$state %in% c("OH", "WI"),
], aes(x = percollege, y = percbelowpoverty) , color="orange", size=1.2)

# Let's see all 5 states for which we have data at once:
our_scatterplot + facet_grid(. ~ state)
# ". ~ state" above is of the form "rows ~ (distributed as) columns). So if we use the following, we can put the graph into rows:
our_scatterplot + facet_grid(state ~ .)

# Let's compare counties in metro areas vs. counties not in metro areas:
our_plot + geom_point(
color = factor(inmetro, labels = c("Not in Metro Area", "In Metro Area"))
# We could also use "shape" instead of "color" here (or in addition to it)
) +
xlab("Percent of the population with a college degree") +
ylab("Percent of the population below the poverty level") +
labs( color = "County is in a metro area?") +

midwest$county[midwest$percbelowpoverty == max(midwest$percbelowpoverty)]
midwest$state[midwest$percbelowpoverty == max(midwest$percbelowpoverty)]


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s