Today we are discussing ggplot2. Below is code that we talked through, showing the basic syntax and approach for ggpplot2.

install.packages("ggplot2") # Install the ggplot2 library. library("ggplot2") # Load the ggplot2 library. colnames(midwest) # See the column names of the "midwest" dataset # From http://r4stats.com/examples/graphics-ggplot2/: # <blockquote> # To understand ggplot, you need to ask yourself, what are the fundamental parts of every data graph? They are: # Aesthetics - these are the roles that the variables play in each graph. A variable may control where points appear, the color or shape of a point, the height of a bar and so on. # Geoms - these are the geometric objects. Do you need bars, points, lines? # Statistics - these are the functions like linear regression you might need to draw a line. # Scales - these are legends that show things like circular symbols represent females while circles represent males. # Facets - these are the groups in your data. Faceting by gender would cause the graph to repeat for the two genders. #</blockquote> # Set up the basic plot: our_plot <- ggplot( midwest, # The dataset aes(x = percollege, y = percbelowpoverty) ) # From the basic plot, make a scatterplot: our_scatterplot <- our_plot + geom_point() our_scatterplot # Let's add a linear regression line to that ("lm" is R's command for running a basic "linear model"): our_scatterplot + geom_smooth(method = lm) # Let's take away the background on that: our_scatterplot + geom_smooth(method = lm) + theme_classic() # Let's highlight Ohio and Wisconsin's data: our_scatterplot + theme_classic() + geom_point(data = midwest[ midwest$state %in% c("OH", "WI"), ], aes(x = percollege, y = percbelowpoverty) , color="orange", size=1.2) # Let's see all 5 states for which we have data at once: our_scatterplot + facet_grid(. ~ state) # ". ~ state" above is of the form "rows ~ (distributed as) columns). So if we use the following, we can put the graph into rows: our_scatterplot + facet_grid(state ~ .) # Let's compare counties in metro areas vs. counties not in metro areas: our_plot + geom_point( aes( color = factor(inmetro, labels = c("Not in Metro Area", "In Metro Area")) # We could also use "shape" instead of "color" here (or in addition to it) ) ) + xlab("Percent of the population with a college degree") + ylab("Percent of the population below the poverty level") + labs( color = "County is in a metro area?") + theme_classic() midwest$county[midwest$percbelowpoverty == max(midwest$percbelowpoverty)] midwest$state[midwest$percbelowpoverty == max(midwest$percbelowpoverty)]

Advertisements