Lab 05 - Sad plots

Load packages and data

library(tidyverse) 
library(dsbox)

#Instructors

Exercise 1

instructors_long = instructors %>% 
  pivot_longer(cols=!year,
               names_to = "faculty_type",
               values_to = "value")

Exercise 2

instructors_long %>%
  ggplot(aes(x=year, y=value, color=faculty_type)) + 
  geom_line()+
  labs(x="Year", y="Percent of Hires", title="Faculty hiring show rise in faculty not on the tenure-track")+
  scale_color_discrete(name="Type of faculty", labels = c("Non-tenure-track", "Tenure-track", "Tenured", "Grad Student", "Part-time"))

Exercise 3

To emphasize part-time faculty, I will change the title and the colors to make the story easy to see.

Exercise 4

instructors_long = instructors_long %>%
  mutate(faculty_type = fct_relevel(faculty_type, "part_time"))

instructors_long %>%
  ggplot(aes(x=year, y=value, color=faculty_type))+
#lwd=c(1,1.05)[(faculty_type=="part_time")+1]
  geom_line()+
  guides(lwd="none")+
  labs(x="Year", y="Percent of Hires", title="Since 1975, more and more part-time faculty have been hired")+
  scale_color_manual(name="Type of faculty",
                     values = c("red", "skyblue", "blue", "navy", "green"),
                     labels = c("Part-time","Non-tenure-track", "Tenure-track", "Tenured", "Grad Student"))

We could go further and grey out all but part time, but I didn’t know how to make the the labels clear. Other options are possible.

Here is a nice example solution from the TA’s lab group (J Marvald, C Hammond, and un-named coauthors):

# Run the following in console to get package for labels to work
# install.packages("ggrepel")
# Then include the library in the markdown
library(ggrepel) #could be here or at the top with the other libraries

## Warning: package 'ggrepel' was built under R version 4.1.3

# make new df for labels
label_instructors =
  instructors_long %>%
  mutate(percent = value) #change of variable name to match their code

# create label for last year only
label_instructors$label = NA_integer_
label_instructors$label[which(label_instructors$year == max(label_instructors$year))] = label_instructors$faculty_type[which(label_instructors$year == max(label_instructors$year))]
label_instructors$label = factor(label_instructors$label, labels=levels(label_instructors$faculty_type))

# use new df with labels to plot 
label_instructors %>%
  ggplot(aes(x = year, y = percent, color = faculty_type)) +
  geom_line() + 
  xlim(1975, 2020) +
  ylim(0, 45) +
  geom_label_repel(aes(label = label),
                   nudge_x = 9,
                   nudge_y = -3,
                   force = 4,
                   arrow = arrow(length = unit(0.02, "npc")),
                   segment.linetype = 4,
                   na.rm = TRUE) +
  theme(legend.position = "none") +
  scale_color_manual(values = c("grey", "grey", "grey", "grey", "red")) + 
  labs(x = "Year", y = "Percent of Total Hires", col = "Type of Hire", 
       title = "Trends in Instructional Staff Employment Status (1975-2011)")

#Fisheries

The following solution comes from the lab group “Damson” from last year. I think a lot of you this year did some changes even more impressive than this, but there are a lot of reasonable answers as long as you wrote out the thought process in Ex. 5.

Exercise 5

There are a lot of low yield countries making each plot look very busy. It would be beneficial to sum up the total yield of captured and aquaculture produced fish for the lower yield countries. With regard to the line plot, a different plot should be used. In this plot, it appears that the colors correspond to the specific variant of the fishing industry, but there is still variance in the line for each country, which is odd. A singular bar plot for the top countries and sum of low yield nations showing the caught and aquaculture grown fish would be a far more useful plot.

steps thought process: pivot long isolates the top countries (top 5) - lump fct_lump_min(n= ??, w=values captured) bar graph A. proportion captured B. proportion aquaculture

Exercise 6

fisheries_total = fisheries %>%
  replace(is.na(.), 0) %>%
  mutate(total = capture + aquaculture)
fisheries_prop = fisheries_total %>%
  mutate(fct = fct_lump_n(country, 6, w = total, other_level = "Other")) %>%
  mutate(capture_prop = capture / total,
         aquaculture_prop = aquaculture / total) %>%
  mutate(fct = fct_recode(fct, "China" = "People's Republic of China")) %>%
  pivot_longer(cols = c("capture_prop", "aquaculture_prop"), 
               names_to = "fishery_type_prop",
               values_to = "proportion")
fisheries_prop %>%
   pivot_longer(cols = c("capture", "aquaculture"), 
               names_to = "fishery_type",
               values_to = "tons_of_fish") %>%
  mutate(country_fct = fct_relevel(fct, "China", "Peru", "USA", "Indonesia", 
                "India", "Other")) %>%
  ggplot(aes(x = country_fct, y = tons_of_fish, fill = fishery_type)) +
  geom_bar(stat = "identity", position = position_dodge()) +
  scale_x_discrete(labels = function(x) str_wrap(x, width = 10)) +
  labs(title = "Amount of Fish Harvested (Tons): Aquaculture vs Capture",
       subtitle = "Top 5 Countries with Highest Fish Harvest",
       x = "Country",
       y = "Amount of Fish (Tons)") +
  scale_fill_manual("Fishery Type",
                     labels = c("Aquaculture", "Capture"),
                     values = c("red", "black"))

Rubric

25 points total.

5 questions @ 3 points for correct and complete answers
5 points github commit history
5 points coding style, R chunks are properly labelled, and your figures are reasonably sized.