In this assignment we’ll look at traffic accidents in New York State. It covers all recorded accidents in NY in 2018 and 2019. Some of the variables were modified for the purposes of this assignment.

Getting started

Go to the course GitHub organization and locate your homework repo, which should be named hw-2-YOUR_GITHUB_USERNAME. Grab the URL of the repo, and clone it in RStudio. First, open the R Markdown document hw02.Rmd and Knit it. Make sure it compiles without errors. The output will be in the file markdown .md file with the same name.

Warm up

Before we introduce the data, let’s warm up with some simple exercises.

Update the YAML, changing the author name to your name, and knit the document.
Commit your changes with a meaningful commit message.
Push your changes to GitHub.
Go to your repo on GitHub and confirm that your changes are visible in your Rmd and md files. If anything is missing, commit and push again.

Packages

We’ll use the tidyverse package for much of the data wrangling and visualization, and vroom to load the .csv. This is purely a convenience to deal with a .csv file compressed with the xz algorithm as it avoids decompressing it before reading. We’ll also need the lubridate package to wrangle our dates. These packages is already installed for you. You can load them by running the following in your Console:

library(tidyverse)
library(lubridate)
library(vroom)

Data

We can load the data with the following:

crashes = vroom("https://urmc-bst.github.io/bst430-fall2024-site/hw_lab_instruction/hw02-accidents/data/ny_collisions_2018_2019.csv.gz")

You can find out more about the dataset in the NY open data portal: https://data.ny.gov/Transportation/Motor-Vehicle-Crashes-Case-Information-Three-Year-/e8ky-4vqe . There’s a detailed data dictionary here.

Exercises

How many observations (rows) does the dataset have? Instead of hard coding the number in your answer, use inline code.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Make a simple table counting occurrences of the Crash Descriptor. Use this and the existing levels in Crash Descriptor to make a add another variable called severity. Make this variable a factor, with shorter, yet descriptive names. Set the factor levels so that they are ordered by severity. In your answer, don’t forget to label your R chunk(s) as well (where it says label-me-1). Your label should be short, informative, shouldn’t include spaces, and shouldn’t shouldn’t repeat a previous label.
Add a column dt to crashes which converts the Date column to an appropriate an date class using lubridate.
Add a new a column decimal_hours that converts Time into fractional hours since midnight, also using lubridate.
Recreate the following plot, and describe in context of the data. Describe the patterns you see for Property accidents vs Fatal accidents on the weekdays vs weekends.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.

Recreate this plot:

Hint: use lubridate::monthto extract the numeric index of the month.

Upon what date did the highest total number of accidents occur? Examine the data and columns provided, and see if you can determine a cause for the date with the highest total number of accidents. In general, what is a possible explanation for the pattern observed between warm-season (May-Oct) and cold-season (Nov-Apr) Total and Fatal accidents?
Create another data visualization based on these data and interpret it. You can choose any variables and any type of visualization you like, but it must have at least three variables, e.g. a scatterplot of x vs. y isn’t enough, but if points are colored or faceted by z, that’s fine. In your answer, don’t forget to label your R chunk as well.

🧶 ✅ ⬆️ Knit, commit, and push your changes to GitHub with an appropriate commit message. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards and review the md document, and the lintr report on GitHub to make sure you’re happy with the final state of your work.

Rubric: 29 points total

8 ex @ 3 pts per
5 points github commits