BST430 Lecture 4

.title[
# BST430 Lecture 4
]
.subtitle[
## Data types in R
]
.author[
### Tanzy Love, based on the course by Andrew McDavid
]
.institute[
### U of Rochester
]
.date[
### 2021-09-01 (updated: 2024-09-10 by TL)
]

---

# Why should you care about data types?

---

## Example: Cat lovers

A survey asked respondents their name and number of cats. The instructions said to enter the number of cats as a numerical value.

``` r
cat_lovers = read_csv("l04/data/cat-lovers.csv")
```

``` r
cat_lovers
```

```
## # A tibble: 60 × 3
##   name           number_of_cats handedness
##   <chr>          <chr>          <chr>     
## 1 Bernice Warren 0              left      
## 2 Woodrow Stone  0              left      
## 3 Willie Bass    1              left      
## 4 Tyrone Estrada 3              left      
## # ℹ 56 more rows
```

Here's the [R code in this lecture](l04/l04-data-types-i.R)

Here's the [datafile](l04/data/cat-lovers.csv)
*You have to have a good filepath to the dataset*
---

## Oh why won't you work?!

``` r
cat_lovers %>%
  summarise(mean_cats = mean(number_of_cats))
```

```
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `mean_cats = mean(number_of_cats)`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
```

```
## # A tibble: 1 × 1
##   mean_cats
##       <dbl>
## 1        NA
```

---

``` r
?mean
```

---

## Oh why won't you still work??!!

``` r
cat_lovers %>%
  summarise(mean_cats = mean(number_of_cats, na.rm = TRUE))
```

```
## Warning: There was 1 warning in `summarise()`.
## ℹ In argument: `mean_cats = mean(number_of_cats, na.rm = TRUE)`.
## Caused by warning in `mean.default()`:
## ! argument is not numeric or logical: returning NA
```

```
## # A tibble: 1 × 1
##   mean_cats
##       <dbl>
## 1        NA
```

---

## Take a breath and look at your data

``` r
glimpse(cat_lovers)
```

```
## Rows: 60
## Columns: 3
## $ name           <chr> "Bernice Warren", "Woodrow Stone", "Will…
## $ number_of_cats <chr> "0", "0", "1", "3", "3", "2", "1", "1", …
## $ handedness     <chr> "left", "left", "left", "left", "left", …
```
--

---

## Let's take another look

.small[
<div class="datatables html-widget html-fill-item" id="htmlwidget-1b4ff99564eb6e8884a5" style="width:100%;height:auto;"></div>
<script type="application/json" data-for="htmlwidget-1b4ff99564eb6e8884a5">{"x":{"filter":"none","vertical":false,"data":[["1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16","17","18","19","20","21","22","23","24","25","26","27","28","29","30","31","32","33","34","35","36","37","38","39","40","41","42","43","44","45","46","47","48","49","50","51","52","53","54","55","56","57","58","59","60"],["Bernice Warren","Woodrow Stone","Willie Bass","Tyrone Estrada","Alex Daniels","Jane Bates","Latoya Simpson","Darin Woods","Agnes Cobb","Tabitha Grant","Perry Cross","Wanda Silva","Alicia Sims","Emily Logan","Woodrow Elliott","Brent Copeland","Pedro Carlson","Patsy Luna","Brett Robbins","Oliver George","Calvin Perry","Lora Gutierrez","Charlotte Sparks","Earl Mack","Leslie Wade","Santiago Barker","Jose Bell","Lynda Smith","Bradford Marshall","Irving Miller","Caroline Simpson","Frances Welch","Melba Jenkins","Veronica Morales","Juanita Cunningham","Maurice Howard","Teri Pierce","Phil Franklin","Jan Zimmerman","Leslie Price","Bessie Patterson","Ethel Wolfe","Naomi Wright","Sadie Frank","Lonnie Cannon","Tony Garcia","Darla Newton","Ginger Clark","Lionel Campbell","Florence Klein","Harriet Leonard","Terrence Harrington","Travis Garner","Doug Bass","Pat Norris","Dawn Young","Shari Alvarez","Tamara Robinson","Megan Morgan","Kara Obrien"],["0","0","1","3","3","2","1","1","0","0","0","0","1","3","3","2","1","1","0","0","1","1","0","0","4","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","0","1","3","3","2","1","1.5 - honestly I think one of my cats is half human","0","0","1","0","1","three","1","1","1","0","0","2"],["left","left","left","left","left","left","left","left","left","left","left","left","left","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","right","ambidextrous","ambidextrous","ambidextrous","ambidextrous","ambidextrous"]],"container":"<table class=\"display\">\n  <thead>\n    <tr>\n      <th> <\/th>\n      <th>name<\/th>\n      <th>number_of_cats<\/th>\n      <th>handedness<\/th>\n    <\/tr>\n  <\/thead>\n<\/table>","options":{"columnDefs":[{"orderable":false,"targets":0},{"name":" ","targets":0},{"name":"name","targets":1},{"name":"number_of_cats","targets":2},{"name":"handedness","targets":3}],"order":[],"autoWidth":false,"orderClasses":false}},"evals":[],"jsHooks":[]}</script>
]

---

## Sometimes you might need to babysit your respondents

``` r
cat_lovers %>%
  mutate(number_of_cats = case_when(
    name == "Ginger Clark" ~ 2,
    name == "Doug Bass"    ~ 3,
    TRUE                   ~ as.numeric(number_of_cats)
    )) %>%
  summarise(mean_cats = mean(number_of_cats))
```

```
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `number_of_cats = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion
```

```
## # A tibble: 1 × 1
##   mean_cats
##       <dbl>
## 1     0.833
```
]

---

## Always you need to respect data types

``` r
cat_lovers %>%
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats
      ),
    number_of_cats = as.numeric(number_of_cats)
    ) %>%
  summarise(mean_cats = mean(number_of_cats))
```

```
## # A tibble: 1 × 1
##   mean_cats
##       <dbl>
## 1     0.833
```

---

## Now that we know what we're doing...

``` r
*cat_lovers = cat_lovers %>%
  mutate(
    number_of_cats = case_when(
      name == "Ginger Clark" ~ "2",
      name == "Doug Bass"    ~ "3",
      TRUE                   ~ number_of_cats
      ),
    number_of_cats = as.numeric(number_of_cats)
    )
```

---

## Moral of the story

- If your data does not behave how you expect it to, type coercion upon reading in the data might be the reason.
- Go in and investigate your data, apply the fix, *save your data*, live happily ever after.

---

.hand[.light-blue[now that we have a good motivation for]]  
.hand[.light-blue[learning about data types in R]]

<br>

---

# Data types

---

## Atomic data types in R

These are the fundamental building blocks (**atoms**) of all R vectors (and all data in R is a vector!)
- **logical**
- **integer** numbers
- **double** (real) numbers
- **complex** numbers
- **character**
- and some more, but we won't be discussing these yet.

---

## Logical & character

``` r
typeof(TRUE)
```

```
## [1] "logical"
```
]
.pull-right[
**character** - character strings

``` r
typeof("hello")
```

```
## [1] "character"
```
]

---

## Double & integer

``` r
typeof(1.335)
```

```
## [1] "double"
```

``` r
typeof(7)
```

```
## [1] "double"
```
]
.pull-right[
**integer** - integer numerical values (indicated with an `L`)

``` r
typeof(7L)
```

```
## [1] "integer"
```

``` r
typeof(1:3)
```

```
## [1] "integer"
```
]

---

## Complex numbers

R also natively supports complex numbers, which are their own type:
.pull-left[

``` r
roots_of_unity = 
  c(1+0i, -1+0i, 0+1i, 0-1i)
typeof(roots_of_unity)
```

```
## [1] "complex"
```

``` r
roots_of_unity^2
```

```
## [1]  1+0i  1+0i -1+0i -1+0i
```
]
.pull-right[

``` r
roots_of_unity^4
```

```
## [1] 1+0i 1+0i 1+0i 1+0i
```

``` r
Re(roots_of_unity)
```

```
## [1]  1 -1  0  0
```

``` r
Im(roots_of_unity)
```

```
## [1]  0  0  1 -1
```
]

---

## Lists

**Lists** are 1d objects that can contain any combination of R objects

``` r
mylist = list("A", 1:4, c(TRUE, FALSE))
mylist
```

```
## [[1]]
## [1] "A"
## 
## [[2]]
## [1] 1 2 3 4
## 
## [[3]]
## [1]  TRUE FALSE
```
]]

``` r
*str(mylist)
```

```
## List of 3
##  $ : chr "A"
##  $ : int [1:4] 1 2 3 4
##  $ : logi [1:2] TRUE FALSE
```
]

---

# `str` is our friend
.font130[
*  shows the *str*ucture of the data

*  `str` is *nearly* a synonym for `glimpse`

*  It shows detailed information on the composition of object.

*  You should reach for it first when you are trying to understand the low-level properties of an R object.
]
---

## Concatenation

Vectors can be constructed and **concatenated** using the `c()` function.
.pull-left[.small[

``` r
digits = c(1, 2, 3)
hello = c("Hello", "World!")
greet = c(c("hi", "hello"), 
              c("bye", "jello"))
```
]]

``` r
str(digits)
```

```
##  num [1:3] 1 2 3
```

``` r
str(hello)
```

```
##  chr [1:2] "Hello" "World!"
```

``` r
str(greet)
```

```
##  chr [1:4] "hi" "hello" "bye" "jello"
```
]

---

## Vector length

Get the number of entries in a vector with `length(x)`.

``` r
x = c(1, 2, 3)
y = character(2)
empty_dbl = numeric(0)
empty_chr = character(0)
```
]

``` r
length(x)
```

```
## [1] 3
```

``` r
length(y)
```

```
## [1] 2
```

``` r
length(empty_dbl)
```

```
## [1] 0
```

``` r
length(empty_chr)
```

```
## [1] 0
```
]
---

## Concatenation

Lists can also be concatenated using the `c()` function.

``` r
list1 = list(1, 2, 3)
list2 = list(
  c("Hi!", "I'm a vector", 
    "nested inside", "a list"))
cat12 = c(list1, list2)
```
]
.pull-right[

``` r
str(cat12)
```

```
## List of 4
##  $ : num 1
##  $ : num 2
##  $ : num 3
##  $ : chr [1:4] "Hi!" "I'm a vector" "nested inside" "a list"
```
]

Compare to `list(list1, list2)` which would nest the two lists rather than join them.
---

## length(c(x, y)) = length(x) + length(y)

``` r
length(list1)
```

```
## [1] 3
```

``` r
length(list2)
```

```
## [1] 1
```

``` r
length(c(list1, list2))
```

```
## [1] 4
```

## Named lists

We often want to name the elements of a list (can also do this with vectors). This can make reading and accessing the list more 
straight forward.

``` r
myotherlist = list(A = "hello", B = 1:4, "knock knock" = "who's there?")
str(myotherlist)
```

```
## List of 3
##  $ A          : chr "hello"
##  $ B          : int [1:4] 1 2 3 4
##  $ knock knock: chr "who's there?"
```

``` r
names(myotherlist)
```

```
## [1] "A"           "B"           "knock knock"
```

``` r
myotherlist$B
```

```
## [1] 1 2 3 4
```
]

---

## unlisting

Really, this should be called unnesting.  Often, it is [a code smell](https://en.wikipedia.org/wiki/Code_smell) in R, and indicates an issue with how the code was designed

``` r
str(myotherlist)
```

```
## List of 3
##  $ A          : chr "hello"
##  $ B          : int [1:4] 1 2 3 4
##  $ knock knock: chr "who's there?"
```

``` r
unlist(myotherlist, recursive = TRUE)
```

```
##              A             B1             B2             B3 
##        "hello"            "1"            "2"            "3" 
##             B4    knock knock 
##            "4" "who's there?"
```

## Converting between types

``` r
x = 1:3
x
```

```
## [1] 1 2 3
```

``` r
typeof(x)
```

```
## [1] "integer"
```
]
--
.pull-right[

``` r
y = as.character(x)
y
```

```
## [1] "1" "2" "3"
```

``` r
typeof(y)
```

```
## [1] "character"
```
]

---

## Converting between types

``` r
x = c(TRUE, FALSE)
x
```

```
## [1]  TRUE FALSE
```

``` r
typeof(x)
```

```
## [1] "logical"
```
]
--
.pull-right[

``` r
y = as.numeric(x)
y
```

```
## [1] 1 0
```

``` r
typeof(y)
```

```
## [1] "double"
```
]

---

## Converting between types

R will happily convert between various types without complaint when different types of data are concatenated in a vector, and that's not always a great thing!

``` r
c(1, "Hello")
```

```
## [1] "1"     "Hello"
```

``` r
c(FALSE, 3L)
```

```
## [1] 0 3
```
]
--
.pull-right[

``` r
c(1.2, 3L)
```

```
## [1] 1.2 3.0
```

``` r
c(2L, "two")
```

```
## [1] "2"   "two"
```
]

---

## Explicit vs. implicit coercion

Let's give formal names to what we've seen so far:

--
- **Explicit coercion** is when you call a function like `as.logical()`, `as.numeric()`, `as.integer()`, `as.double()`, or `as.character()`

--
- **Implicit coercion** happens when you use a vector in a specific context that expects a certain type of vector

---

# Special values

---

## Special values

- `NA`: Not available
- `NaN`: Not a number
- `Inf`: Positive infinity
- `-Inf`: Negative infinity

``` r
pi / 0
```

```
## [1] Inf
```

``` r
0 / 0
```

```
## [1] NaN
```
]
.pull-right[

``` r
1/0 - 1/0
```

```
## [1] NaN
```

``` r
1/0 + 1/0
```

```
## [1] Inf
```
]

---

## `NA`s are special ❄️s

``` r
x = c(1, 2, 3, 4, NA)
```

``` r
mean(x)
```

```
## [1] NA
```

``` r
mean(x, na.rm = TRUE)
```

```
## [1] 2.5
```

``` r
summary(x)
```

```
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    1.00    1.75    2.50    2.50    3.25    4.00       1
```

---

## `NA`s are, by default, logical

R uses `NA` to represent missing values in its data structures.

``` r
typeof(NA)
```

```
## [1] "logical"
```

.footnote[There are also ```NA_integer_```, ```NA_real_``` and ```NA_character_```, occasionally needed to avoid warnings or errors about unplanned coercions.]
---

## Mental model for `NA`s

- Unlike `NaN`, `NA`s are genuinely unknown values
- But that doesn't mean they can't function in a logical way
- Let's think about why `NA`s are logical...

``` r
# TRUE or NA
TRUE | NA
```

```
## [1] TRUE
```
]
.pull-right[

``` r
# FALSE or NA
FALSE | NA
```

```
## [1] NA
```
]

`$\rightarrow$` See next slide for answers...

---

- `NA` is unknown, so it could be `TRUE` or `FALSE`

``` r
TRUE | TRUE  # if NA was TRUE
```

```
## [1] TRUE
```

``` r
TRUE | FALSE # if NA was FALSE
```

```
## [1] TRUE
```
]
]
.pull-right[
.midi[
- `FALSE | NA`

``` r
FALSE | TRUE  # if NA was TRUE
```

```
## [1] TRUE
```

``` r
FALSE | FALSE # if NA was FALSE
```

```
## [1] FALSE
```
]
]

- Doesn't make sense for mathematical operations 
- Makes sense in the context of missing data

---

## Vectors vs. lists

``` r
x = c(8,4,7)
```
]
.small[

``` r
x[1]
```

```
## [1] 8
```
]
.small[

``` r
x[[1]]
```

```
## [1] 8
```
]
]
--
.pull-right[
.small[

``` r
y = list(8,4,7)
```
]
.small[

``` r
y[2]
```

```
## [[1]]
## [1] 4
```
]
.small[

``` r
y[[2]]
```

```
## [1] 4
```
]
]

<br>

**Note:** When using tidyverse code you'll rarely need to refer to elements using square brackets, but it's good to be aware of this syntax, especially since you might encounter it when searching for help online.

---

## Vectors vs lists: the punchline

*  Plain vectors must be flat and "atomic"--comprised of only a single base R type: `logical`, `integer`, `numeric`, `complex` or `character`.

*  Lists can be arbitrarily nested and contain any R object.

*  Both have length.

*  Both can be named.

---

# R Classes and attributes

---

## types are elemental

``` r
typeof(1)
```

```
## [1] "double"
```
*

``` r
typeof("A")
```

```
## [1] "character"
```
*

``` r
typeof(list(1))
```

```
## [1] "list"
```
]

.pull-right[
**Meatspace elements**
* hydrogen
<img src = "l04/img/320px-Hydrogen_discharge_tube.jpg" width = "48%">
* carbon
<img src = "l04/img/Graphite-and-diamond-with-scale.jpg" width = "48%">
* uranium
<img src = "l04/img/600px-HEUraniumC.jpg" width = "48%">
]

???

These types can either be atomic (integer, character, numeric, boolean) or generic (lists).

---
class: code70

## Attributes are add-on properties

``` r
attributes(1)
```

```
## NULL
```
* 
.scroll-box-10[

``` r
attributes(starwars)
```

```
## $names
##  [1] "name"       "height"     "mass"       "hair_color"
##  [5] "skin_color" "eye_color"  "birth_year" "sex"       
##  [9] "gender"     "homeworld"  "species"    "films"     
## [13] "vehicles"   "starships" 
## 
## $row.names
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
## [21] 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## [41] 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## [61] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## [81] 81 82 83 84 85 86 87
## 
## $class
## [1] "tbl_df"     "tbl"        "data.frame"
```
]]

---

## classes are compounds

`class` is an `attribute` that signifies a **compound type** (made up of multiple elements or compounds).

`class` is a special attribute. It affects what flavor of a function is applied.  We'll discuss this in greater detail in a few weeks.

---
class: code70

## Classes

``` r
class(starwars)
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```
*

``` r
class(1)
```

```
## [1] "numeric"
```
*

``` r
class(ggplot(starwars, aes(x = weight))+ geom_histogram())
```

```
## [1] "gg"     "ggplot"
```
]

.pull-right[
**Meatspace compounds**
*  `$H_2O$`
<img src = "l04/img/640px-Ice_Block.jpg" width = "50%">
*  NaCl
<img src = "l04/img/640px-Salt_Farmers.jpg" width = "50%">
*  `$C_8H_{10}N_4O_2$`
<img src = "l04/img/640px-A_small_cup_of_coffee.jpeg" width = "50%">
]

---

# R Data "sets"

---

## Rectangular Data "sets" in R

- A rectangular (spreadsheet-like) data "set" can be one of the following class:
    + `tibble`
    + `data.frame`
- We'll often work with `tibble`s:
    + `readr` package (e.g. `read_csv` function) loads data as a `tibble` by default
    + `tibble`s are part of the tidyverse, so they work well with other packages we are using
    + they implement safer and more sensible defaults, so are less likely to cause hard to track bugs in your code

---

## Data frames

- A data frame is the most commonly used data structure in R, they are just a `list` of equal length vectors (usually atomic). Each vector is treated as a column and elements of the vectors as rows.

- A tibble is a type of data frame that ... makes your life (i.e. data analysis) easier.

- Most often a data frame will be constructed by reading in from a file, but we can also create them from scratch.
---

## Data frames

``` r
df = tibble(x = 1:3, y = c("a", "b", "c"))
typeof(df)
```

```
## [1] "list"
```

``` r
class(df)
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```

``` r
str(df)
```

```
## tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
##  $ x: int [1:3] 1 2 3
##  $ y: chr [1:3] "a" "b" "c"
```

---

## Data frames (cont.)

``` r
attributes(df)
```

```
## $class
## [1] "tbl_df"     "tbl"        "data.frame"
## 
## $row.names
## [1] 1 2 3
## 
## $names
## [1] "x" "y"
```

``` r
typeof(df$y)
```

```
## [1] "character"
```

---

## Working with tibbles in pipelines

``` r
mean_cats = cat_lovers %>%
  summarise(mean_cats = mean(number_of_cats))

cat_lovers %>%
  filter(number_of_cats < mean_cats) %>%
  nrow()
```

```
## [1] 60
```

???

Why this works within an error or warning is an entirely different question, and relates to the bowels of the `data.frame` methods for the `groupGeneric`.  I still can't figure out what method is being dispatched here and why it does what it does...
---

## A result of a pipeline is always a tibble

``` r
mean_cats
```

```
## # A tibble: 1 × 1
##   mean_cats
##       <dbl>
## 1     0.833
```

``` r
str(mean_cats)
```

```
## tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
##  $ mean_cats: num 0.833
```

---

## `pull()` can be your new best friend

But use it sparingly!

``` r
mean_cats = cat_lovers %>%
  summarise(mean_cats = mean(number_of_cats)) %>%
  pull()

cat_lovers %>%
  filter(number_of_cats < mean_cats) %>%
  nrow()
```

```
## [1] 32
```

``` r
mean_cats
```

```
## [1] 0.8333333
```

``` r
class(mean_cats)
```

```
## [1] "numeric"
```

---

## How does tidyverse *want* you to do that code?

``` r
 cat_lovers %>%
     filter(number_of_cats < mean(number_of_cats)) %>%
     nrow()
```

```
## [1] 32
```

This will work if the number that you need to calculate comes from **the same** dataset as the one you are filtering

Note: since 2022, someone has added a warning that displays when you run the code as it appears in the notes.
I don't think it's useful for identifying the problem here, but at least it might make you look at the results and see if they are correct.

> Warning: Using one column matrices in `filter()` was deprecated in dplyr 1.1.0.
> ℹ Please use one dimensional logical vectors instead.

---

# Factors

---

## Factors

Factor objects are how R stores data for categorical variables (fixed numbers of discrete values).

``` r
x = factor(c("BS", "MS", "PhD", "MS"))
```

``` r
attributes(x)
```

```
## $levels
## [1] "BS"  "MS"  "PhD"
## 
## $class
## [1] "factor"
```

``` r
typeof(x)
```

```
## [1] "integer"
```

---

## Read data in as character strings

``` r
str(cat_lovers)
```

```
## tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
##  $ name          : chr [1:60] "Bernice Warren" "Woodrow Stone" "Willie Bass" "Tyrone Estrada" ...
##  $ number_of_cats: num [1:60] 0 0 1 3 3 2 1 1 0 0 ...
##  $ handedness    : chr [1:60] "left" "left" "left" "left" ...
```

---

## But coerce when plotting

``` r
ggplot(cat_lovers, mapping = aes(x = handedness)) +
  geom_bar()
```

---

## Use forcats to manipulate factors

``` r
cat_lovers = cat_lovers %>%
  mutate(handedness = fct_relevel(handedness, 
                                  "right", "left", "ambidextrous"))
```

``` r
ggplot(cat_lovers, mapping = aes(x = handedness)) +
  geom_bar()
```

---

## .pull-left[
Come for the functionality
]

... stay for the logo

- R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. For historical reasons many base R functions automatically convert character vectors to factors, and have been heartily cursed by generations of R programmers for this default behavior.

- Factors **are** useful when you have true categorical data, and when you want to override the ordering of character vectors to improve display. The forcats package provides a suite of useful tools that solve common problems with factors.

Source: [forcats.tidyverse.org](http://forcats.tidyverse.org/)

---

## Recap

- Best to think of data as part of a tibble
    + This plays nicely with the `tidyverse` as well
    + Rows are observations, columns are variables
    
- Be careful about data types / classes
    + Sometimes `R` makes silly assumptions about your data class 
        +  `tibble`s have safer defaults, but won't fold laundry for you
        + Think about your data in context, e.g. 0/1 variable is most likely a `factor`
    + If a plot/output is not behaving the way you expect, first
    investigate the data class with `str`
    + If you are absolutely sure how you want a factor, over-write it 
    so that you don't need to keep having to keep track of it
        + `mutate` the variable with the correct class
        
---

# Acknowledgments

This lecture contains materials adapted from  [Mine Çetinkaya-Rundel and colleagues](https://www2.stat.duke.edu/courses/Spring18/Sta199/slides/lec-slides/05b-coding-style-data-types.html#1) and [data science in a box](https://rstudio-education.github.io/datascience-box/course-materials/slides/u2-d10-data-types/u2-d10-data-types.html#1)