Given below are two data visualizations that violate many data visualization best practices. Improve these visualizations using R and the tips for effective visualizations that we introduced in class. For exercises 4 and 6, you should produce one visualization per dataset. Your visualization should be accompanied by a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plots and why, and how you addressed them in the visualization you created.

In class on 6 October, you will give a brief presentation describing one of your improved visualizations and the reasoning for the choices you made. For this, it’s fine to just step through your markdown explaining the plot and code.

Learning goals

Telling a story with data
Data visualization best practices
Reshaping data

Getting started

Go to the course GitHub organization and locate your repo, clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.

Warm up

Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.

Packages

We’ll use the tidyverse package for much of the data wrangling and visualisation and the data lives in the dsbox package. Either load the library or the data in the lab repo.

library(tidyverse) 
library(dsbox) #this if it works
library(readr) #or this otherwise

instructors = read_csv("data/instructors.csv")
fisheries = read_csv("data/fisheries.csv")

Data

The datasets we’ll use are called instructors and fisheries from the dsbox package. If you can load the library, the datasets become available to us when we load the package. Otherwise, read in the data. You can find out more about the datasets by inspecting their documentation, which you can access by running ?instructors and ?fisheries in the Console or using the Help menu in RStudio to search for instructors or fisheries. You can also find this information here and here.

Exercises

Instructional staff employment trends

The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.

Let’s start by loading the data used to create this plot.

instructors

## # A tibble: 11 x 6
##     year full_time_tenured full_time_tenure_track full_time_no~1 part_~2 grad_~3
##    <dbl>             <dbl>                  <dbl>          <dbl>   <dbl>   <dbl>
##  1  1975              29                     16.1           10.3    24      20.5
##  2  1989              27.6                   11.4           14.1    30.4    16.5
##  3  1993              25                     10.2           13.6    33.1    18.1
##  4  1995              24.8                    9.6           13.6    33.2    18.8
##  5  1999              21.8                    8.9           15.2    35.5    18.7
##  6  2001              20.3                    9.2           15.5    36      19  
##  7  2003              19.3                    8.8           15      37      20  
##  8  2005              17.8                    8.2           14.8    39.3    19.9
##  9  2007              17.2                    8             14.9    40.5    19.5
## 10  2009              16.8                    7.6           15.1    41.1    19.4
## 11  2011              16.7                    7.4           15.4    41.3    19.3
## # ... with abbreviated variable names 1: full_time_non_tenure_track,
## #   2: part_time, 3: grad_student

Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.

In order to recreate this visualization we need to first pivot the data to have one variable for faculty type and one variable for year. We do the wide to long conversion using pivot_longer(), discussed in lecture 7.

If there are 5 faculty types and 11 years of data, how many rows will the pivotted data have? Do the pivot and save the data into an object instructors_long.

Now we can attempt ot make this plot:

instructors_long %>%
  ggplot(aes(x = year, y = value, color = faculty_type)) +
  geom_line()

Include the line plot you made above in your report and make sure the figure width is large enough to make it legible. Also fix the title, axis labels, and legend label.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story and why.
Implement the changes you proposed in the previous exercise.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Fisheries

Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. This Wikipedia page lists fishery production of countries for 2016. For each country tonnage from capture and aquaculture are listed. Note that countries whose total harvest was less than 100,000 tons are not included in the visualization.

A researcher shared with you the following visualization they created based on these data. 😳

Can you help them improve it? First, brainstorm how you would improve it. It’s ok if some of your improvements are aspirational, i.e. you don’t know how to implement it, but you think it’s a good idea.

Load the data.

fisheries

## # A tibble: 75 x 3
##    country    capture aquaculture
##    <chr>        <dbl>       <dbl>
##  1 Algeria     126259         368
##  2 Angola      240000          NA
##  3 Argentina   931472        2430
##  4 Australia   245935       47087
##  5 Bangladesh 1333866      882091
##  6 Brazil      750283      257783
##  7 Cambodia    384000       26000
##  8 Canada     1080982      154083
##  9 Chile      4330325      698214
## 10 Colombia    121000       60072
## # ... with 65 more rows

Create a new data visualisation for these data that implements the improvements you proposed in the previous exercise (or many of them as you can).

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work.

Lab 05 - Take a sad plot and make it better