Chapter 2 Vectors

Last updated: 2020-08-12 00:35:48

Aims

Our aims in this chapter are:

Learn how to work with R code files
Get to know the simplest data structure in R, the vector
Learn about subsetting, one of the fundamental operations with data

2.1 Editing code

2.1.1 Using code files

In Chapter 1, we typed short and simple expressions into the R console. As we progress, however, the code we write will get longer and more complex. To be able to edit and save longer code, the code is kept in code files.

Computer code is stored in code files as plain text. When writing computer code, we must use a plain text editor, such as Notepad++ or RStudio (Section 1.2). A word processor, such as Microsoft Word, is not a good choice for writing code, because:

Documents created with a word processor contain elements other than plain text (such as highlighting, font types, sizes and colors, etc.), which are not plain text and therefore not processed by the interpreter, leading to confusion.
Word processors can automatically correct “mistakes” thereby introducing unintended changes in our code, such as capitalizing: max(1) → Max(1).

Any plain text file can be used to store R code, though, conventionally, R code files have the *.R file extension. Complete code files can be executed with source (Figure 2.1).

Figure 2.1: Methods of executing R code

For example, we can run the file volcano.R which draws a 3D image of a volcano (Figure 2.2).

source("_book/data/volcano.R")

Figure 2.2: 3D image of the volcano dataset

Selected parts of code can be executed by marking the section and pressing Ctrl+Enter (in RStudio). Executing a single expression can be done by placing the cursor on the particular line and pressing Ctrl+Enter.

2.1.2 RStudio keyboard stortcuts

RStudio has numerous keyboard shortcuts for making it easier to edit and execute code files. Some useful RStudio keyboard shortcuts are given in Table 2.1.

Table 2.1: RStudio keyboard shortcuts
Shortcut	Action
Alt+Shift+K	List of all shortcuts
Ctrl+1	Moving cursor to the code editor
Ctrl+2	Moving cursor to the console
Ctrl+Enter	Running the current selection or line
Ctrl+Shift+P	Re-running the last selection
Ctrl+Alt+B	Running from top to current line
Ctrl+Shift+C	Turn comment on or off
Tab	Auto-complete
Ctrl+D	Delete line
Ctrl+Shift+D	Duplicate line
Ctrl+F	Find and replace menu
Ctrl+S	Save

2.2 Assignment

So far we have been using R by typing expressions into the command line and observing the result on screen. That way, R functions as a “calculator”; the results are not kept in computer memory (Figure 2.3).

Figure 2.3: Simple R expressions are evaluated and printed

Storing objects in the temporary computer memory (RAM) is called assignment. In an assignment expression, we are storing an object, under a certain name, in the RAM (Figure 2.4). Assignment is done using the assignment operator. Assignment is an essential operation in programming, because it makes automation possible—reaching the goal step by step, while storing intermediate products. An assignment expression consists of:

The expression whose result we want to store
The assignment operator, = or <-
The name which will be assigned to the object

For example:

rateEstimate = (6617747987 - 6617746521) / 10

Figure 2.4: An assignment expression stores an object in the RAM

When we type an object name in the console, R accesses an object stored under that name in the RAM, and calls the print function on the object (Figure 2.5):

rateEstimate
## [1] 146.6

print(rateEstimate)
## [1] 146.6

Figure 2.5: Accessing an object stored in the RAM

What happens when we assign a new value to an existing object? The old object gets deleted, and its name is now pointing on the new value:

x = 55
x
## [1] 55

x = "Hello"
x
## [1] "Hello"

Note the difference between the == and = operators! = is an assignment operator:

one = 1
two = 2
one = two
one
## [1] 2
two
## [1] 2

while == is a logical operator for comparison:

one = 1
two = 2
one == two
## [1] FALSE

Which user-defined objects are currently in memory? The ls function returns a character vector with their names:

ls()

Why did we write ls() and not ls?

2.3 Vectors

2.3.1 What is a vector?

A vector, in R, is an ordered collection of values of the same type, such as:

Numbers—numeric or interger
Text—character
Logical—logical

Recall that these are the same three types of “constant values” we saw in Chapter 1. In fact, a constant value is a vector of length 1.

2.3.2 The `c` function

Vectors can be created with the c function, which combines the given vectors in the given order:

x = c(1, 2, 3)
x
## [1] 1 2 3

c(x, 5)
## [1] 1 2 3 5

Here is another example, with character values:

y = c("cat", "dog", "mouse", "apple")
y
## [1] "cat"   "dog"   "mouse" "apple"

2.3.3 Vector subsetting (individual elements)

We can access individual vector elements using the [ operator and an index; in other words, to get a subset with an individual vector element:

y[2]
## [1] "dog"
y[3]
## [1] "mouse"

Note that the index starts at 1!

Here is another example:

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts[4]
## [1] 1

Note the components of an expression for accessing a vector element (Figure 2.6).

Figure 2.6: Components of expression to access vector elements

We can also assign new values into a vector subset:

x = c(1, 2, 3)
x
## [1] 1 2 3

x[2] = 300
x
## [1]   1 300   3

In this example, we made an assignment into a subset with a single element. As we will see later on, we can assign values into a subset of any length using the same method (Section 2.3.9).

2.3.4 Calling functions on a vector

There are various functions for calculating vector properties. For example:

x = c(1, 6, 3, -8, 2)
x
## [1]  1  6  3 -8  2

length(x)  # Number of elements
## [1] 5

min(x)     # Minimum
## [1] -8

max(x)     # Maximum
## [1] 6

range(x)   # Minimum, maximum
## [1] -8  6

mean(x)    # Average
## [1] 0.8

sum(x)     # Sum
## [1] 4

Other functions operate on each vector element, returning a vector of results having the same length as the input:

sqrt(x)  # Square root
## [1] 1.000000 2.449490 1.732051      NaN 1.414214

Why does the output of sqrt(x) contain NaN?

abs(x)   # Absolute value
## [1] 1 6 3 8 2

2.3.5 The recycling rule (arithmetic)

Binary operations, such as arithmetic and logical operators, applied on two vectors are done element-by-element, and a vector of the results is returned:

c(1, 2, 3) + c(10, 20, 30)
## [1] 11 22 33

c(1, 2, 3) * c(10, 20, 30)
## [1] 10 40 90

c(1, 2, 3) > c(10, 20, 30)
## [1] FALSE FALSE FALSE

c(1, 2, 3) < c(10, 20, 30)
## [1] TRUE TRUE TRUE

What happens when the input vector lengths do not match? The shorter vector gets “recycled”. For example, when one of the vectors is of length 3 and the other vector is of length 6, then the shorter vector (of length 3) is replicated 2 times until it matches the longer vector (Figure 2.7):

c(1, 2, 3)          + c(1, 2, 3, 4, 5, 6)
## [1] 2 4 6 5 7 9

c(1, 2, 3, 1, 2, 3) + c(1, 2, 3, 4, 5, 6)
## [1] 2 4 6 5 7 9

Figure 2.7: Vector recycling

When one of the vectors is of length 1 and the other is of length 4, the shorter vector (of length 1) is replicated 4 times:

2             * c(1, 2, 3, 4)
## [1] 2 4 6 8

c(2, 2, 2, 2) * c(1, 2, 3, 4)
## [1] 2 4 6 8

When one of the vectors is of length 2 and the other is of length 6, the shorter vector (of length 2) is replicated 3 times:

c(10, 100)                   + c(1, 2, 3, 4, 5, 6)
## [1]  11 102  13 104  15 106

c(10, 100, 10, 100, 10, 100) + c(1, 2, 3, 4, 5, 6)
## [1]  11 102  13 104  15 106

When longer vector length is not a multiple of the shorter one, the result comes with a warning message that recycling is “incomplete”:

c(1, 2)    * c(1, 2, 3)
## Warning in c(1, 2) * c(1, 2, 3): longer object length is not a multiple of
## shorter object length
## [1] 1 4 3

c(1, 2, 1) * c(1, 2, 3)
## [1] 1 4 3

2.3.6 Consecutive and repetitive vectors

2.3.6.1 Introduction

Other than the c function, there are three commonly used methods for creating consecutive or repetitive vectors:

The : operator
The seq function
The rep function

2.3.6.2 Consecutive vectors

The : operator is used to create a vector of consecutive vectors in steps of 1 or -1:

1:10   # Steps of 1
##  [1]  1  2  3  4  5  6  7  8  9 10

55:43  # Steps of -1
##  [1] 55 54 53 52 51 50 49 48 47 46 45 44 43

The seq function provides a more general way to create a consecutive vector with any step size. The three most useful parameters of the seq function are:

from—Where to start
to—When to end
by—Step size

For example:

seq(from = 100, to = 150, by = 10)
## [1] 100 110 120 130 140 150

seq(from = 100, to = 80, by = -5)
## [1] 100  95  90  85  80

2.3.6.3 Repetitive vectors

The rep function replicates its argument to create a repetitive vector:

x—What to replicate
times—How many times to repeat x

For example:

rep(x = 22, times = 10)
##  [1] 22 22 22 22 22 22 22 22 22 22

rep(x = c(18, 0, 9), times = 3)
## [1] 18  0  9 18  0  9 18  0  9

2.3.7 Function calls

Using the seq function, we will demonstrate three properties of function calls. First, we can omit parameter names as long as the arguments are passed in the default order:

seq(from = 5, to = 10, by = 1)
## [1]  5  6  7  8  9 10

seq(5, 10, 1)
## [1]  5  6  7  8  9 10

Second, we can use any argument order as long as parameter names are specified:

seq(to = 10, by = 1, from = 5)
## [1]  5  6  7  8  9 10

seq(by = 1, from = 5, to = 10)
## [1]  5  6  7  8  9 10

seq(from = 5, by = 1, to = 10)
## [1]  5  6  7  8  9 10

Third, we can omit parameters that have a default argument as part of the function definition. For example, the by parameter of seq has a default value of 1:

seq(5, 10, 1)
## [1]  5  6  7  8  9 10

seq(5, 10)
## [1]  5  6  7  8  9 10

To find out what are the parameters of a particular function, their order or their default values, we can look into the documentation:

# ?seq

2.3.8 Vector subsetting (general)

So far, we created vector subsets using a numeric index which consists of a single value, such as:

x = c(43, 85, 10)
x[3]
## [1] 10

We can also use a vector of length >1 as an index. For example:

x[1:2]
## [1] 43 85

Note that the vector does not need to be consecutive, and can include repetitions:

x[c(1, 1, 3, 2)]
## [1] 43 43 10 85

Here is another example (Figure 2.8):

counts = c(2, 0, 3, 1, 3)
counts[1:3]
## [1] 2 0 3

Figure 2.8: Vector subsetting with a vector of indices (1:3)

And here is one more example (Figure 2.9):

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts[c(1:3, 7:9)]
## [1] 2 0 3 9 0 2

Figure 2.9: Vector subsetting with a vector of indices (c(1:3, 7:9))

For the next examples, let’s create a vector of all even numbers between 1 and 100:

x = seq(2, 100, 2)
x
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100

What is the meaning of the numbers in square brackets when printing the vector?

How many elements does x have?

length(x)
## [1] 50

What is the value of the last element in x?

x[50]
## [1] 100

x[length(x)]
## [1] 100

Which of the last two expressions is preferable and why?

How can we get the entire vector using subsetting with a numeric index?

x[1:length(x)]
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100

How can we get the entire vector except for the last element?

x[1:(length(x)-1)]
##  [1]  2  4  6  8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
## [26] 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98

What numeric index can we use to get a reversed vector?

Note that there is a special function named rev for reversing a vector:

rev(x)
##  [1] 100  98  96  94  92  90  88  86  84  82  80  78  76  74  72  70  68  66  64
## [20]  62  60  58  56  54  52  50  48  46  44  42  40  38  36  34  32  30  28  26
## [39]  24  22  20  18  16  14  12  10   8   6   4   2

When requesting an index beyond vector length, we get NA (Not Available). For example:

x[55]
## [1] NA

x[1:80]
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100  NA  NA  NA  NA  NA  NA  NA
## [58]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## [77]  NA  NA  NA  NA

2.3.9 The recycling rule (assignment)

Earlier, we saw the recycling rule with arithmetic operators. The rule also applies to assignment. For example, here NA is replicated six times, to match the subset length 6:

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts
##  [1]  2  0  3  1  3  2  9  0  2  1 11  2

counts[c(1:3, 7:9)] = NA
counts
##  [1] NA NA NA  1  3  2 NA NA NA  1 11  2

Here, c(NA, 99) is replicated three times, also to match the subset length 6:

counts[c(1:3, 7:9)] = c(NA, 99)
counts
##  [1] NA 99 NA  1  3  2 99 NA 99  1 11  2

2.3.10 Logical vectors

2.3.10.1 Creating logical vectors

The third common type of vectors are logical vectors. A logical vector is composed of logical values: TRUE and FALSE. For example:

c(TRUE, FALSE, FALSE)
## [1]  TRUE FALSE FALSE

rep(TRUE, 7)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Usually, we will not be creating logical vectors manually, but through applying a logical operator on a numeric or character vector. Note how the recycling rule applies to logical operators as well:

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

x >= 7
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

When arithmetic operations are applied to a logical vector, the logical vector is converted to a numeric one, where TRUE becomes 1 and FALSE becomes 0. For example:

sum(x >= 7)
## [1] 4

mean(x >= 7)
## [1] 0.4

What is the meaning of the values 4 and 0.4 in the above example?

2.3.10.2 Subsetting with logical vectors

A logical vector can be used as an index for subsetting. For example:

counts = c(2, 0, 3, 1, 3)

counts < 3
## [1]  TRUE  TRUE FALSE  TRUE FALSE

counts[counts < 3]
## [1] 2 0 1

The logical vector counts<3 specifies whether to include each of the elements of counts in the resulting subset (Figure 2.10).

Figure 2.10: Subsetting with a logical vector

Here are some more examples of subsetting with a logical index:

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10

x[x >= 3]  # Elements of 'x' greater or equal than 3
## [1]  3  4  5  6  7  8  9 10

x[x != 2]  # Elements of 'x' not equal to 2 
## [1]  1  3  4  5  6  7  8  9 10

x[x > 4 | x < 2]  # Elements of 'x' greater than 4 OR smaller than 2
## [1]  1  5  6  7  8  9 10

x[x > 4 & x < 2]  # Elements of 'x' greater than 4 AND smaller than 2
## integer(0)

What does the output integer(0) we got in the last expression mean? Why do you think we got this result?

The next example is slightly more complex; we select the elements of z whose square is larger than 8:

z = c(5, 2, -3, 8)
z[z^2 > 8]
## [1]  5 -3  8

Let’s go over this step-by-step. First, z^2 gives a vector of squared z values (2 is recycled):

z^2
## [1] 25  4  9 64

Then, each of the squares is compared to 8 (8 is recycled):

z^2 > 8
## [1]  TRUE FALSE  TRUE  TRUE

Finally, the logical vector z^2>8 is used for subsetting z.

2.3.11 Missing values

The is.na function is used to detect missing (NA) values in a vector:

Accepts a vector of any type
Returns a logical vector with TRUE in place of NA values and FALSE in place of non-NA values

For example:

x = c(28, 58, NA, 31, 39, NA, 9)
x
## [1] 28 58 NA 31 39 NA  9

is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

Many functions that summarize vector properties, such as sum and mean, have a parameter called na.rm. The na.rm parameter is used to determine whether NA values are excluded from the calculation. The default is na.rm=FALSE, meaning that NA values are not excluded. For example:

x = c(28, 58, NA, 31, 39, NA, 9)
mean(x)
## [1] NA
mean(x, na.rm = TRUE)
## [1] 33

Why do we get NA in the first expression?

What do you think will be the result of length(x)?

How can we replace the NA values in x with the mean of its non-NA values?

2.4 Some useful functions

2.4.1 `any` and `all`

Sometimes we want to figure out whether a logical vector:

contains at least one TRUE value; or
is entirely composed of TRUE values.

We can use the any and all functions, respectively, to do those things.

The any function returns TRUE if at least one of the input vector values is TRUE, otherwise it returns FALSE. For example, let’s take a numeric vector x:

x = c(2, 6, 2, 3, 0, 1, 6)
x
## [1] 2 6 2 3 0 1 6

The expression any(x > 5) returns TRUE, which means that the vector x > 5 contains at least one TRUE value, i.e., at least one element of x is greater than 5:

x > 5
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
any(x > 5)
## [1] TRUE

The expression any(x > 88) returns FALSE, which means that the vector x > 88 contains no TRUE values, i.e., none of the elements of x is greater than 88:

x > 88
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
any(x > 88)
## [1] FALSE

The all function returns TRUE if all of the input vector values are TRUE, otherwise it returns FALSE. For example, the expression all(x > 5) returns FALSE, which means that the vector x > 5 contains at least one FALSE value, i.e., not all elements of x are greater than 5:

x > 5
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
all(x > 5)
## [1] FALSE

The expression all(x > -1) returns TRUE, which means that x > -1 is composed entirely of TRUE values, i.e., all elements of x are greater than -1:

x > -1
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
all(x > -1)
## [1] TRUE

In a way, any and all are inverse:

any determines if the logical vector contains at least one TRUE value.
all determines if the logical vector contains at least one FALSE value.

2.4.2 `which`

The which function converts a logical vector to a numeric one with the indices of TRUE values. That way, we can find out the index of values that satisfy a given condition. For example, considering the vector x:

x
## [1] 2 6 2 3 0 1 6

the expression which(x > 2.3) returns the indices of TRUE elements in x > 2.3, i.e., the indices of x elements which are greater than 2.3:

x > 2.3
## [1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
which(x > 2.3)
## [1] 2 4 7

2.4.3 `which.min` and `which.max`

Related functions which.min and which.max return the index of the (first!) minimal or maximal value in a vector, respectively. For example, considering the vector x:

x
## [1] 2 6 2 3 0 1 6

using which.min we can find out that the minimal value of x is in the 5^th position:

which.min(x)
## [1] 5

while using which.max we can find out that the maximal value of x is in the 2^nd position:

which.max(x)
## [1] 2

What expression can we use to find all indices (2, 7) of the maximal value in x?

2.4.4 The `order` function

The order function returns ordered vector indices, based on the order of vector values. In other words, order gives the index of the smallest value, the index of the second smallest value, etc., up to the index of the largest value. For example, given the vector x:

x
## [1] 2 6 2 3 0 1 6

order(x) returns the indices 1:length(x), ordered from smallest to largest value:

order(x)
## [1] 5 6 1 3 4 2 7

This result tells us that the 5^th element of x is the smallest, the 6^th is the second smallest, and so on.

We can also get the reverse order with decreasing=TRUE:

order(x, decreasing = TRUE)
## [1] 2 7 4 1 3 6 5

How can we get a sorted vector of elements from x, as shown below, using the order function?

## [1] 0 1 2 2 3 6 6

2.4.5 `paste` and `paste0`

The paste function is used to “paste” text values. Its sep parameter determines the separating character(s), with default sep=" " (space). For example:

paste("There are", "5", "books.")
## [1] "There are 5 books."
paste("There are", "5", "books.", sep = "_")
## [1] "There are_5_books."

Non-character vectors are automatically converted to character before pasting:

paste("There are", 80, "books.")
## [1] "There are 80 books."

The recycling rule applies in paste too:

paste("image", 1:5, ".tif", sep = "")
## [1] "image1.tif" "image2.tif" "image3.tif" "image4.tif" "image5.tif"

The paste0 function is a shortcut for paste with sep="":

paste0("image", 1:5, ".tif")
## [1] "image1.tif" "image2.tif" "image3.tif" "image4.tif" "image5.tif"

Chapter 2 Vectors

Aims

2.1 Editing code

2.1.1 Using code files

2.1.2 RStudio keyboard stortcuts

2.2 Assignment

2.3 Vectors

2.3.1 What is a vector?

2.3.2 The c function

2.3.3 Vector subsetting (individual elements)

2.3.4 Calling functions on a vector

2.3.5 The recycling rule (arithmetic)

2.3.6 Consecutive and repetitive vectors

2.3.6.1 Introduction

2.3.6.2 Consecutive vectors

2.3.6.3 Repetitive vectors

2.3.7 Function calls

2.3.8 Vector subsetting (general)

2.3.9 The recycling rule (assignment)

2.3.10 Logical vectors

2.3.10.1 Creating logical vectors

2.3.10.2 Subsetting with logical vectors

2.3.11 Missing values

2.4 Some useful functions

2.4.1 any and all

2.4.2 which

2.4.3 which.min and which.max

2.4.4 The order function

2.4.5 paste and paste0

2.3.2 The `c` function

2.4.1 `any` and `all`

2.4.2 `which`

2.4.3 `which.min` and `which.max`

2.4.4 The `order` function

2.4.5 `paste` and `paste0`