Go to the course GitHub organization and locate your repo, clone it in RStudio and open the R Markdown document. Knit the document to make sure it compiles without errors.
Before we introduce the data, let’s warm up with some simple exercises. Update the YAML of your R Markdown file with your information, knit, commit, and push your changes. Make sure to commit with a meaningful commit message. Then, go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.
We’ll use the tidyverse, as well as (likely) lubridate, pheatmap, and possibly reshape2. Install them if you don’t already have them. You can load them by running the following in your Console:
These data are from the New York Times covid tracking page. Read more here. They are from manual curation of state, county and national announcements and records. Here we examine the incident cases and seven-day trailing rolling averages.
We use the data from March 2020 to October of 2021 because the rolling averages are already calculated.
Repeat your previous plot, zooming in on the final 2 months of data. What are some high-frequency periodic patterns you observe in the reporting of cases?
Make a plot that helps determine if the phase of the pattern conserved across states, and if it conserved across time (say, restricting attention to just the last 3 months).
Quite a bit of editorial decision-making is required to derive robust rolling averages, relating to days on which cases are reassigned across geographical boundaries, changes in case definitions when assays improved. See the NYT anomalies page for some discussion.
Ok, so now hopefully it’s clear why using the smoothed data is important. Using the cases_avg_per_100k
and deaths_avg_per_100k
, repeat your plot from 1.
Describe what dplyr operations you’d need to calculate the following. No need to actually write the code (unless you think that would be more concise than explaining in words).
Now, convert the cases data into a wide format and cast to a matrix. You can either do this with pivot_wider
, then select
only the states, not the date, then as.matrix
and lastly set the rownames
to the date corresponding to this row, or in one operation using reshape2::acast
. Verify that your matrix is numeric
.
Calculate the quantities discussed in 5, this time actually making the calculations. You should use the functions base::rowSums
, a function row_var
, which I have defined below for you,
and the function which.max
. When was it easier to work with the long data in dplyr? What was it easier to work with the wide matrix data?
pheatmap::pheatmap
. Discuss. Also, try experimenting with making a heatmap of the cases per capita matrix – could this could be used to define waves of the pandemic?
I contend it’s pronounced “feet”-map.🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document on GitHub to make sure you’re happy with the final state of your work.
Go back through your write up to make sure you’re following coding style guidelines we discussed in class. Make any edits as needed.
Also, make sure all of your R chunks are properly labelled, and your figures are reasonably sized.
Once a team leader for the week pushes their final changes, others should pull the changes and knit the R Markdown document to confirm that they can reproduce the report.
26 points