Given below are two data visualizations that violate many data visualization best practices. Improve these visualizations using R and the tips for effective visualizations that we introduced in class. For exercises 4 and 6, you should produce one visualization per dataset. Your visualization should be accompanied by a brief paragraph describing the choices you made in your improvement, specifically discussing what you didn’t like in the original plots and why, and how you addressed them in the visualization you created.
In class on 6 October, you will give a brief presentation describing one of your improved visualizations and the reasoning for the choices you made. For this, it’s fine to just step through your markdown explaining the plot and code.
Go to the course GitHub organization and locate your repo, clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
We’ll use the tidyverse package for much of the data wrangling and visualisation and the data lives in the dsbox package. Either load the library or the data in the lab repo.
library(tidyverse)
library(dsbox) #this if it works
library(readr) #or this otherwise
= read_csv("data/instructors.csv")
instructors = read_csv("data/fisheries.csv") fisheries
The datasets we’ll use are called instructors
and
fisheries
from the dsbox package. If you
can load the library, the datasets become available to us when we load
the package. Otherwise, read in the data. You can find out more about
the datasets by inspecting their documentation, which you can access by
running ?instructors
and ?fisheries
in the
Console or using the Help menu in RStudio to search for
instructors
or fisheries
. You can also find
this information here
and here.
The American Association of University Professors (AAUP) is a nonprofit membership association of faculty and other academic professionals. This report compiled by the AAUP shows trends in instructional staff employees between 1975 and 2011, and contains an image very similar to the one given below.
Let’s start by loading the data used to create this plot.
instructors
## # A tibble: 11 x 6
## year full_time_tenured full_time_tenure_track full_time_no~1 part_~2 grad_~3
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1975 29 16.1 10.3 24 20.5
## 2 1989 27.6 11.4 14.1 30.4 16.5
## 3 1993 25 10.2 13.6 33.1 18.1
## 4 1995 24.8 9.6 13.6 33.2 18.8
## 5 1999 21.8 8.9 15.2 35.5 18.7
## 6 2001 20.3 9.2 15.5 36 19
## 7 2003 19.3 8.8 15 37 20
## 8 2005 17.8 8.2 14.8 39.3 19.9
## 9 2007 17.2 8 14.9 40.5 19.5
## 10 2009 16.8 7.6 15.1 41.1 19.4
## 11 2011 16.7 7.4 15.4 41.3 19.3
## # ... with abbreviated variable names 1: full_time_non_tenure_track,
## # 2: part_time, 3: grad_student
Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.
In order to recreate this visualization we need to first pivot the
data to have one variable for faculty type and one variable for year. We
do the wide to long conversion using pivot_longer()
, discussed
in lecture 7.
instructors_long
.Now we can attempt ot make this plot:
%>%
instructors_long ggplot(aes(x = year, y = value, color = faculty_type)) +
geom_line()
Include the line plot you made above in your report and make sure the figure width is large enough to make it legible. Also fix the title, axis labels, and legend label.
Suppose the objective of this plot was to show that the proportion of part-time faculty have gone up over time compared to other instructional staff types. What changes would you propose making to this plot to tell this story and why.
Implement the changes you proposed in the previous exercise.
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
Fisheries and Aquaculture Department of the Food and Agriculture Organization of the United Nations collects data on fisheries production of countries. This Wikipedia page lists fishery production of countries for 2016. For each country tonnage from capture and aquaculture are listed. Note that countries whose total harvest was less than 100,000 tons are not included in the visualization.
A researcher shared with you the following visualization they created based on these data. 😳
Load the data.
fisheries
## # A tibble: 75 x 3
## country capture aquaculture
## <chr> <dbl> <dbl>
## 1 Algeria 126259 368
## 2 Angola 240000 NA
## 3 Argentina 931472 2430
## 4 Australia 245935 47087
## 5 Bangladesh 1333866 882091
## 6 Brazil 750283 257783
## 7 Cambodia 384000 26000
## 8 Canada 1080982 154083
## 9 Chile 4330325 698214
## 10 Colombia 121000 60072
## # ... with 65 more rows
🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work.
Go back through your write up to make sure you’re following coding style guidelines we discussed in class. Make any edits as needed.
Also, make sure all of your R chunks are properly labelled, and your figures are reasonably sized.
Once the last team member for the week pushes their final changes, others should pull the changes and knit the R Markdown document to confirm that they can reproduce the report.
Want to see more ugly charts?
25 points total. * 5 questions @ 3 points for correct and complete answers * 5 points github commit history * 5 points coding style, R chunks are properly labelled, and your figures are reasonably sized.