# Chapter 2 Vectors

*Last updated: 2020-09-28 17:43:58 *

## Aims

Our aims in this chapter are:

- Learn how to work with R code files
- Get to know the simplest data structure in R, the vector
- Learn about subsetting, one of the fundamental operations with data

## 2.1 Editing code

### 2.1.1 Using code files

In Chapter 1, we typed short and simple expressions into the R console. As we progress, however, the code we write will get longer and more complex. To be able to edit and save longer code, the code is kept in code *files*.

Computer code is stored in code files as **plain text**. When writing computer code, we must use a plain text editor, such as **Notepad++** or **RStudio** (Section 1.2). A word processor, such as **Microsoft Word**, is not a good choice for writing code, because:

- Documents created with a word processor contain elements other than plain text (such as highlighting, font types, sizes and colors, etc.), which are not plain text and therefore not processed by the interpreter, leading to confusion.
- Word processors can automatically correct “mistakes” thereby introducing unintended changes in our code, such as capitalizing:
`max(1)`

→`Max(1)`

.

Any plain text file can be used to store R code, though, conventionally, R code files have the `*.R`

file extension. Complete code files can be executed with `source`

(Figure 2.1).

For example, we can run the file `volcano.R`

which draws a 3D image of a volcano (Figure 2.2).

Selected parts of code can be executed by marking the section and pressing **Ctrl+Enter** (in RStudio). Executing a single expression can be done by placing the cursor on the particular line and pressing **Ctrl+Enter**.

### 2.1.2 RStudio keyboard stortcuts

RStudio has numerous keyboard shortcuts for making it easier to edit and execute code files. Some useful RStudio keyboard shortcuts are given in Table 2.1.

Shortcut | Action |
---|---|

Alt+Shift+K |
List of all shortcuts |

Ctrl+1 |
Moving cursor to the code editor |

Ctrl+2 |
Moving cursor to the console |

Ctrl+Enter |
Running the current selection or line |

Ctrl+Shift+P |
Re-running the last selection |

Ctrl+Alt+B |
Running from top to current line |

Ctrl+Shift+C |
Turn comment on or off |

Tab |
Auto-complete |

Ctrl+D |
Delete line |

Ctrl+Shift+D |
Duplicate line |

Ctrl+F |
Find and replace menu |

Ctrl+S |
Save |

## 2.2 Assignment

So far we have been using R by typing expressions into the command line and observing the result on screen. That way, R functions as a “calculator”; the results are not kept in computer memory (Figure 2.3).

Storing objects in the temporary computer memory (RAM) is called assignment. In an assignment expression, we are storing an object, under a certain name, in the RAM (Figure 2.4). Assignment is done using the assignment operator. Assignment is an essential operation in programming, because it makes automation possible—reaching the goal step by step, while storing intermediate products. An assignment expression consists of:

- The
**expression**whose result we want to store - The assignment
**operator**,`=`

or`<-`

- The
**name**which will be assigned to the object

For example:

When we type an object name in the console, R accesses an object stored under that name in the RAM, and calls the `print`

function on the object (Figure 2.5):

What happens when we assign a new value to an existing object? The old object gets deleted, and its name is now pointing on the new value:

Note the difference between the `==`

and `=`

operators! `=`

is an assignment operator:

while `==`

is a logical operator for comparison:

Which user-defined objects are currently in memory? The `ls`

function returns a character vector with their names:

Why did we write

`ls()`

and not`ls`

?

## 2.3 Vectors

### 2.3.1 What is a vector?

A vector, in R, is an *ordered* collection of values of the *same type*, such as:

**Numbers**—`numeric`

or`interger`

**Text**—`character`

**Logical**—`logical`

Recall that these are the same three types of “constant values” we saw in Chapter 1. In fact, a constant value is a vector of length 1.

### 2.3.2 The `c`

function

Vectors can be created with the `c`

function, which *combines* the given vectors in the given order:

Here is another example, with `character`

values:

### 2.3.3 Vector subsetting (individual elements)

We can access individual vector *elements* using the `[`

operator and an **index**; in other words, to get a subset with an individual vector element:

Note that the index starts at `1`

!

Here is another example:

Note the components of an expression for accessing a vector element (Figure 2.6).

We can also assign new values into a vector subset:

In this example, we made an assignment into a subset with a *single* element. As we will see later on, we can assign values into a subset of *any* length using the same method (Section 2.3.9).

### 2.3.4 Calling functions on a vector

There are various functions for calculating vector properties. For example:

Other functions operate on each vector element, returning a vector of results having the same length as the input:

Why does the output of

`sqrt(x)`

contain`NaN`

?

### 2.3.5 The recycling rule (arithmetic)

Binary operations, such as arithmetic and logical operators, applied on two vectors are done **element-by-element**, and a vector of the results is returned:

What happens when the input vector lengths do not match? The shorter vector gets **“recycled”**. For example, when one of the vectors is of length 3 and the other vector is of length 6, then the shorter vector (of length 3) is replicated 2 times until it matches the longer vector (Figure 2.7):

When one of the vectors is of length 1 and the other is of length 4, the shorter vector (of length 1) is replicated 4 times:

When one of the vectors is of length 2 and the other is of length 6, the shorter vector (of length 2) is replicated 3 times:

When longer vector length is not a multiple of the shorter one, the result comes with a warning message that recycling is “incomplete”:

### 2.3.6 Consecutive and repetitive vectors

#### 2.3.6.1 Introduction

Other than the `c`

function, there are three commonly used methods for creating **consecutive** or **repetitive** vectors:

- The
`:`

operator - The
`seq`

function - The
`rep`

function

#### 2.3.6.2 Consecutive vectors

The `:`

operator is used to create a vector of consecutive vectors in steps of `1`

or `-1`

:

The `seq`

function provides a more general way to create a consecutive vector with *any* step size. The three most useful parameters of the `seq`

function are:

`from`

—Where to start`to`

—When to end`by`

—Step size

For example:

#### 2.3.6.3 Repetitive vectors

The `rep`

function *replicates* its argument to create a repetitive vector:

`x`

—What to replicate`times`

—How many times to repeat`x`

For example:

### 2.3.7 Function calls

Using the `seq`

function, we will demonstrate three properties of function calls. First, we can omit parameter names as long as the arguments are passed in the default order:

Second, we can use any argument order as long as parameter names are specified:

Third, we can omit parameters that have a **default** argument as part of the function definition. For example, the `by`

parameter of `seq`

has a default value of `1`

:

To find out what are the parameters of a particular function, their order or their default values, we can look into the documentation:

### 2.3.8 Vector subsetting (general)

So far, we created vector subsets using a `numeric`

index which consists of a single value, such as:

We can also use a vector of length >1 as an index. For example:

Note that the vector does not need to be consecutive, and can include repetitions:

Here is another example (Figure 2.8):

And here is one more example (Figure 2.9):

For the next examples, let’s create a vector of all *even* numbers between 1 and 100:

```
x = seq(2, 100, 2)
x
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
```

What is the meaning of the numbers in square brackets when printing the vector?

How many elements does `x`

have?

What is the value of the last element in `x`

?

Which of the last two expressions is preferable and why?

How can we get the entire vector using subsetting with a `numeric`

index?

```
x[1:length(x)]
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
## [20] 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76
## [39] 78 80 82 84 86 88 90 92 94 96 98 100
```

How can we get the entire vector *except for* the last element?

```
x[1:(length(x)-1)]
## [1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
## [26] 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98
```

What

`numeric`

index can we use to get a reversed vector?

Note that there is a special function named `rev`

for reversing a vector:

```
rev(x)
## [1] 100 98 96 94 92 90 88 86 84 82 80 78 76 74 72 70 68 66 64
## [20] 62 60 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 28 26
## [39] 24 22 20 18 16 14 12 10 8 6 4 2
```

When requesting an index beyond vector length, we get `NA`

(*Not Available*). For example:

### 2.3.9 The recycling rule (assignment)

Earlier, we saw the recycling rule with arithmetic operators. The rule also applies to assignment. For example, here `NA`

is replicated six times, to match the subset length 6:

Here, `c(NA, 99)`

is replicated three times, also to match the subset length 6:

### 2.3.10 Logical vectors

#### 2.3.10.1 Creating logical vectors

The third common type of vectors are `logical`

vectors. A logical vector is composed of `logical`

values: `TRUE`

and `FALSE`

. For example:

Usually, we will not be creating `logical`

vectors *manually*, but through applying a logical operator on a `numeric`

or `character`

vector. Note how the recycling rule applies to logical operators as well:

When arithmetic operations are applied to a `logical`

vector, the `logical`

vector is *converted* to a numeric one, where `TRUE`

becomes `1`

and `FALSE`

becomes `0`

. For example:

What is the meaning of the values

`4`

and`0.4`

in the above example?

#### 2.3.10.2 Subsetting with logical vectors

A `logical`

vector can be used as an index for subsetting. For example:

The logical vector `counts<3`

specifies whether to *include* each of the elements of `counts`

in the resulting subset (Figure 2.10).

Here are some more examples of subsetting with a `logical`

index:

What does the output

`integer(0)`

we got in the last expression mean? Why do you think we got this result?

The next example is slightly more complex; we select the elements of `z`

whose square is larger than 8:

Let’s go over this step-by-step. First, `z^2`

gives a vector of squared `z`

values (`2`

is recycled):

Then, each of the squares is compared to 8 (`8`

is recycled):

Finally, the `logical`

vector `z^2>8`

is used for subsetting `z`

.

### 2.3.11 Missing values

The `is.na`

function is used to detect *missing* (`NA`

) values in a vector:

- Accepts a
**vector**of any type - Returns a
**logical vector**with`TRUE`

in place of`NA`

values and`FALSE`

in place of non-`NA`

values

For example:

Many functions that summarize vector properties, such as `sum`

and `mean`

, have a parameter called `na.rm`

. The `na.rm`

parameter is used to determine whether `NA`

values are *excluded* from the calculation. The default is `na.rm=FALSE`

, meaning that `NA`

values are *not* excluded. For example:

Why do we get

`NA`

in the first expression?

What do you think will be the result of

`length(x)`

?

How can we replace the

`NA`

values in`x`

with the mean of its non-`NA`

values?

## 2.4 Some useful functions

### 2.4.1 `any`

and `all`

Sometimes we want to figure out whether a `logical`

vector:

- contains
**at least one**`TRUE`

value; or - is
**entirely**composed of`TRUE`

values.

We can use the `any`

and `all`

functions, respectively, to do those things.

The `any`

function returns `TRUE`

if at least *one* of the input vector values is `TRUE`

, otherwise it returns `FALSE`

. For example, let’s take a numeric vector `x`

:

The expression `any(x > 5)`

returns `TRUE`

, which means that the vector `x > 5`

contains at least one `TRUE`

value, i.e., at least one element of `x`

is greater than `5`

:

The expression `any(x > 88)`

returns `FALSE`

, which means that the vector `x > 88`

contains no `TRUE`

values, i.e., none of the elements of `x`

is greater than `88`

:

The `all`

function returns `TRUE`

if *all* of the input vector values are `TRUE`

, otherwise it returns `FALSE`

. For example, the expression `all(x > 5)`

returns `FALSE`

, which means that the vector `x > 5`

contains at least one `FALSE`

value, i.e., *not* all elements of `x`

are greater than `5`

:

The expression `all(x > -1)`

returns `TRUE`

, which means that `x > -1`

is composed entirely of `TRUE`

values, i.e., all elements of `x`

are greater than `-1`

:

In a way, `any`

and `all`

are *inverse*:

`any`

determines if the logical vector contains at least one`TRUE`

value.`all`

determines if the logical vector contains at least one`FALSE`

value.

### 2.4.2 `which`

The `which`

function converts a `logical`

vector to a `numeric`

one with the *indices* of `TRUE`

values. That way, we can find out the index of values that satisfy a given condition. For example, considering the vector `x`

:

the expression `which(x > 2.3)`

returns the indices of `TRUE`

elements in `x > 2.3`

, i.e., the indices of `x`

elements which are greater than `2.3`

:

### 2.4.3 `which.min`

and `which.max`

Related functions `which.min`

and `which.max`

return the index of the (first!) *minimal* or *maximal* value in a vector, respectively. For example, considering the vector `x`

:

using `which.min`

we can find out that the *minimal* value of `x`

is in the 5^{th} position:

while using `which.max`

we can find out that the *maximal* value of `x`

is in the 2^{nd} position:

What expression can we use to find

allindices (`2`

,`7`

) of the maximal value in`x`

?

### 2.4.4 The `order`

function

The `order`

function returns ordered vector indices, based on the order of vector values. In other words, `order`

gives the index of the smallest value, the index of the second smallest value, etc., up to the index of the largest value. For example, given the vector `x`

:

`order(x)`

returns the indices `1:length(x)`

, ordered from smallest to largest value:

This result tells us that the 5^{th} element of `x`

is the smallest, the 6^{th} is the second smallest, and so on.

We can also get the *reverse* order with `decreasing=TRUE`

:

How can we get a

sortedvector of elements from`x`

, as shown below, using the`order`

function?

`## [1] 0 1 2 2 3 6 6`

### 2.4.5 `paste`

and `paste0`

The `paste`

function is used to **“paste”** text values. Its `sep`

parameter determines the separating character(s), with default `sep=" "`

(space). For example:

```
paste("There are", "5", "books.")
## [1] "There are 5 books."
paste("There are", "5", "books.", sep = "_")
## [1] "There are_5_books."
```

Non-character vectors are automatically converted to `character`

before pasting:

The recycling rule applies in `paste`

too:

```
paste("image", 1:5, ".tif", sep = "")
## [1] "image1.tif" "image2.tif" "image3.tif" "image4.tif" "image5.tif"
```

The `paste0`

function is a shortcut for `paste`

with `sep=""`

: