The following instructions will get you ready for Thursday’s lab. I expect you to spend no more than 90 minutes on this in advance. You are not required to turn in any homework, but do spend the time and come to class ready to ask questions. See if you can do all the little exercises below (don’t worry if you get stuck, though).

# Setting up the software

I will repeat my instructions on Canvas about installing the software.

Install R: download the installer for your operating system from the R Project.

Install RStudio IDE: download this relatively friendly interface for using R from RStudio.

Start RStudio. Make sure you’re connected to the internet. Next to the

`>`

prompt, copy and paste the following magical incantations exactly as written and press Return:`install.packages("remotes") remotes::install_github("agoldst/dataculture", dependencies=TRUE)`

You will see messages about downloading and installing; if it is successful, the last message reads:

`* DONE (dataculture)`

and a new`>`

prompt appears.**Optional:**Next to the`>`

, paste the following incantation and press return:`remotes::install_github("lmullen/genderdata")`

This downloads some biggish data files. I have been having trouble making sure it is working, so I am not going to use this data in the lab on October 27. However, if you are interested in pursuing names and naming data further, this step will get you some useful and important sources.

# Make R do something

Now that you have started RStudio, what can you do with the thing?

Interact with R by typing at the command prompt `>`

. When you type a line and press Return, R attempts to follow the instructions you have given it. In particular, R is a calculator and sees every line as problem: *Compute the value of this expression*. When you press return, R will print out either the value of the expression (preceded by `[1]`

, whose meaning I will explain later) or an error. Then it will print a new prompt `>`

.

Everywhere you see code below, try typing it in yourself and seeing what happens. Code will be in typewriter font on a gray background, like this:

`"Code"`

For starters, try out:

`1`

(Click in front of the `>`

prompt, type `1`

, and press Return. That’s all!)

Now try:

`17.1 + 23.9`

Now try:

`what is this`

What happens?

Don’t be afraid of errors and mistakes. You can’t break R. At worst you can quit the program and start again. But R’s error messages are notoriously hard to understand. In this case `unexpected symbol`

might as well say `DOES NOT COMPUTE`

.

It is traditional to begin by using R as a calculator. This works just about like you’d expect. One thing to notice is that R doesn’t really care how many spaces you use between numbers and arithmetic operators. Try:

`1 + 2 + 3+4`

Nonetheless, it is good to get in the habit of separating things by exactly one space. If you remember the “order of operations” from school, you’ll also remember that sometimes we need parentheses to be unambiguous about what order to do things. This works just the same in R:

`(1 + 2) * 3`

# Letters and strings

R is a calculator, but because this is a data and *culture* class, it’s important to learn right away that R also knows all about letters. In addition to numbers, values can be sequences of alphabetic characters, spaces, punctuation, and so on.^{1} R only recognizes these as values if the sequences are inside quotation marks:

`"Hello, world"`

This is called a “string” and is considered in R to be a single value, just like the number 1, even though it has twelve characters (ten letters, the comma, and a space). Try typing in the start of a string without finishing it: `"Hello`

(no close quote) and press Return. Now you’ll see a `+`

instead of a `>`

prompt. R is expecting more input before it will try to figure out the value. In this case, you can satisfy the machine by typing a `"`

and pressing Return.

In general, though, you can always bail out of a situation like that. At any time, hold down Control and type C. R will forget about the previous line and give you a new `>`

prompt.

Finally, there is one more important kind of value to know about from the start, the *logical* value. If you have used other programming languages, you may have encountered the “Boolean” values, which in R go by the names `TRUE`

and `FALSE`

.

**Exercise.** You should be able to guess what the values of `4 > 2`

and `20000000 < 2`

are.

The test for equality is written `==`

. Try entering the following expressions:

`4 == 2`

`"I" == "I"`

`" I" == "I"`

The test for non-equality is `!=`

.

**Exercise.** Come up with two true expressions using `!=`

, one using numbers and one using letters.

R (unlike many other programming languages) has a very special value called `NA`

. These letters are written *without* quotes, and they stand for *missing information*. Whenever we have data, we will have gaps. `NA`

will help us keep track of the gaps.

**Bonus exercise.** Explore arithmetic and logic with `NA`

and try to figure out why you might want things to work the way they do. Try the following:

`2 + NA`

`NA > NA`

`NA == NA`

# Variables and assignment

Values can be stored and referred to later. The storage location is called a “variable.” Putting a value in storage is called “assignment.” In R, an expression like

`x <- 2`

creates a new storage location, names it `x`

, and *assigns* the value 2 to it. Whenever we use `x`

without quotes in an expression, R will substitute its value.

`x * 2`

`x > 1`

We can change the value in `x`

with another assignment:

`x <- 4`

`x + 1`

Programming-language variables are not exactly like variables in math (or variables in statistics). They are easier. There is no “solving for x.” R variables are just names for storage locations in computer memory. You can choose any name that starts with a letter and doesn’t have any spaces or use any of the operator symbols.^{2}

`my_very_nifty_variable <- "Hello"`

# Functions

A *function* is something that takes some values as input and produces a value as output. Functions from math in high school work as you’d expect. Try the square-root function:

`sqrt(9)`

The name of the function (no quotes) is followed by parentheses containing the input value or values. Multiple inputs are separated by commas. The inconveniently named `paste0`

function sticks two strings together:

`paste0("Good", "bye")`

Everything you do in R, you do with functions. That includes some things that don’t look like math functions at all. One important thing you can do is ask for help:

`help(paste0)`

That brings up the online help for the thing between the parentheses.

Another very important thing, so important that I will ask you to do it every time you start R, is to load up a “package,” which is an add-on to R with more functions and data. If you have set up according to my instructions, you should be able to write

`library(dataculture)`

You’ll see some messages like `Loading required package: tidyverse`

. This is normal.

# Piping

There is another way to use functions in R. It looks a little funny, but we will soon see its uses. Try:

`9 |> sqrt()`

`|>`

is pronounced “pipe.” `x |> f()`

has the same meaning as `f(x)`

. Now try:

`"Good" |> paste0("bye")`

`x |> f(y)`

has the same meaning as `f(x, y)`

. Sometimes it will be useful to be able to write down expressions that use several functions in a row.

`81 |> sqrt() |> sqrt()`

**Exercise.** If you define three variables:

```
part1 <- "Good"
part2 <- "bye"
part3 <- " everyone"
```

How would you use two pipes and two invocations of `paste0`

to produce `"Goodbye everyone"`

from `part1`

, `part2`

, and `part3`

?

# Data frames

If you’ve gotten this far, great. The last piece of R to try out is to look at the way R represents a dataset. Let’s load some names data.

`library(babynames)`

Type `babynames`

. We’ll spend Thursday with this data. (For more information about the data, type `help(babynames)`

.) For the moment, just notice the spreadsheet-like output. This kind of value is called a “data frame” in R. Each row tells you about one name in one year. Each column gives some information about the name. Notice that you have both numbers and strings displayed here. We will learn some very powerful functions for exploring a data frame on Thursday. In RStudio, you can explore a data frame in a spreadsheet-style window with the `View`

function^{3}

`View(babynames)`

# Bonus: Vectors

**This section is completely optional, but may be clarifying.**

Here’s the truth about that `[1]`

you always see. R thinks of data in multiples (“data” really is plural, for R). Most values are “vectors,” which is just a fancy way of saying that R keeps track of a list of values together. For example, a sequence of numbers can be a value:

`1:4`

Each number in the result is called an “element” of the vector. `1`

is element number 1, `2`

is element number 2, and so on. In the backwards sequence

`4:1`

`4`

is element number 1, `3`

is element number 2, and so on.

If we want to store these values for later, R expects us to store them all under a single name, like this:

`nums <- 1:4`

The value of the variable `nums`

is all four numbers. Try it:

`nums`

To get just a single element, we use square brackets:

`nums[2]`

Now try

`1:100`

In the output, the `[1]`

and subsequent bracketed numbers tell you what number element that line of the output starts with.

Culture again: vectors can have strings as elements, not just numbers. R has a built-in variable called `letters`

. Type in

`letters`

**Exercise.** Get the value `"b"`

by getting the appropriate element of `letters`

using square brackets.

**Bonus exercise.** Write an expression for the word “no” using `paste0`

and accessing elements of `letters`

.

R can also have vectors of logical values, or even…vectors of other vectors (which are called “lists.”)

**Double-bonus exercise.** R has a function called `c`

. Try `c(1, 2)`

. Try some more examples. What does `c`

do?

Actually, if your system is set up right, strings can contain any Unicode character. Try

`"😢"`

.↩︎Technically you can even break these rules, but I won’t tell you how. Also, it usually won’t work to try to steal the name of something that R already uses, like

`NA`

.↩︎But if it’s a function, what does R print as its value? In programming terms, the spreadsheet display is called a “side effect,” and the function itself has no meaningful return value (R prints nothing and immediately gives you a new

`>`

prompt).**Footnote bonus exercise:**I’ve already mentioned two other functions with side effects and no obvious return value. Which are they?↩︎