Chapter 2 Vectors

Last updated: 2022-02-21 17:09:12

Aims

Our aims in this chapter are:

  • Learn how to work with R code files
  • Get to know the simplest data structure in R, the vector
  • Learn about subsetting, one of the fundamental operations with data

2.1 R code files

2.1.1 What are code files

In Chapter 1, we typed short and simple expressions into the R console. As we progress, however, the code we write will get longer and more complex. To be able to edit and save longer code, the code is kept in code files. Thus there are two “methods” for executing R code:

  • Typing the code in the console and pressing Enter
  • Sending code stored in a code file to the console (we will see how in a moment)

Either way, the code is interpreted, and we get the results or outputs (Figure 2.1).

Methods of executing R code: sending code from a code file, or typing code using the keyboard

Figure 2.1: Methods of executing R code: sending code from a code file, or typing code using the keyboard

2.1.2 Working with plain text

Computer code is stored in code files as plain text. When writing computer code, we must use a plain text editor, such as Notepad++ or RStudio (Section 1.2). A word processor, such as Microsoft Word, is not a good choice for writing code, because:

  • Documents created with a word processor contain elements other than plain text (such as highlighting, font types, sizes, colors, etc.), which are not plain text and therefore ignored by the interpreter, leading to confusion.
  • Word processors can automatically correct “mistakes” thereby introducing unintended changes in our code, such as capitalizing: max(1)Max(1).

Any plain text file can be used to store R code, though, conventionally, R code files have the *.R file extension.

2.1.3 Code files in RStudio

To start working with a code file in RStudio, we can do one of the following:

  • Create a new code file, selecting File → New File… → R Script from the menu (or pressing Ctrl+Shift+N)
  • Open an existing code file, selecting File → Open File… from the menu (or pressing Ctrl+O)

The new file, or a modified existing file, can be saved by selecting File → Save from the menu (or pressing Ctrl+S).

Open the code file named volcano.R which is included in the book materials in RStudio.

There are several methods to execute code from a code file, by sending it to the console. The most obvious one is to copy a section of code, then paste it in the console, and press Enter, but this is not very convenient. Instead we usually do one of the following:

  • We can send a single expression, by placing the cursor on a particular line and pressing Ctrl+Enter. The expression is executed and the cursor advances to the next line, which means we can press Ctrl+Enter again to execute it too, and so on.
  • We can send a selection of several lines of code, by marking the section and pressing Ctrl+Enter.
  • We can execute all lines from the top of the code file to the line where the cursor is, by pressing Ctrl+Alt+B.

For example, we can run the file volcano.R which draws a 3D image of a volcano (Figure 2.2). The file is contained in the book materials, see Appendix A for the complete list of files and the download link.

3D image of the `volcano` dataset

Figure 2.2: 3D image of the volcano dataset

Try each of the above three methods to execute R code with the volcano.R code file. Running the entire code file, by selecting all code with Ctrl+A and pressing Ctrl+Enter, produces a 3D image of a volcano in the graphical output panel (Figure 2.2).

2.1.4 RStudio keyboard stortcuts

RStudio has numerous keyboard shortcuts for making it easier to edit and execute code files, some of which we already mentioned in the previous section. The most useful RStudio keyboard shortcuts are given in Table 2.1.

Table 2.1: RStudio keyboard shortcuts
Shortcut Action
Ctrl+1 Moving cursor to the code editor
Ctrl+2 Moving cursor to the console
Ctrl+Enter Running the current selection or line
Ctrl+Alt+B Running from top to current line
Ctrl+Shift+C Turn comment on or off
Tab Auto-complete
Ctrl+D Delete line
Ctrl+Shift+D Duplicate line
Ctrl+F Find and replace menu
Ctrl+S Save

2.2 Assignment

So far we have been using R by typing expressions into the command line and observing the result on screen. That way, R functions as a “calculator”; the results are not kept in computer memory (Figure 2.3).

Simple R expressions are evaluated and printed

Figure 2.3: Simple R expressions are evaluated and printed

Storing objects in the temporary computer memory (RAM) is called assignment. In an assignment expression, we are storing an object, under a certain name, in the RAM (Figure 2.4). Assignment is done using the assignment operator. Assignment is an essential operation in programming, because it makes automation possible—reaching the goal step by step, while storing intermediate products. An assignment expression consists of:

  • The expression whose result we want to store
  • The assignment operator, = or <-
  • The name which will be assigned to the object

For example, the following expression assigns the result of the arithmetic calculation (6617747987-6617746521)/10 into a variable named rateEstimate:

rateEstimate = (6617747987 - 6617746521) / 10
An assignment expression stores an object in the RAM

Figure 2.4: An assignment expression stores an object in the RAM

When we type an object name in the console, R accesses an object stored under that name in the RAM, and calls the print function on the object (Figure 2.5):

rateEstimate
## [1] 146.6
print(rateEstimate)
## [1] 146.6
Accessing an object stored in the RAM

Figure 2.5: Accessing an object stored in the RAM

What happens when we assign a new value to an existing object? The old value gets deleted, and the object is associated with a new value:

x = 55
x
## [1] 55
x = "Hello"
x
## [1] "Hello"

Note the difference between the == and = operators! = is an assignment operator:

one = 1
two = 2
one = two
one
## [1] 2
two
## [1] 2

while == is a conditional operator (Section 1.3.4) to test for equality:

one = 1
two = 2
one == two
## [1] FALSE

Which user-defined objects are currently in memory? The ls function returns a character vector (see Section 2.3 below) with their names:

ls()

Why did we write ls() and not ls?

2.3 Vectors

2.3.1 What is a vector?

The vector is the simplest data structure in R, and the first data structure we learn about in this book. A vector, in R, is an ordered collection of values of the same type, such as:

  • Numbersnumeric (numbers with a decimal point) or integer (whole numbers)
  • Textcharacter
  • Logicallogical

Recall that these are the same three types of “constant values” we saw in Chapter 1. In fact, R doesn’t have a special class for individual constant values. A constant value in R is actually represented by a vector of length 1.

The distinction between the numeric and integer classes (Section 1.3.11) is not very important for our purposes. Both of these two classes are used to represent numbers, and R automatically converts from one to another, as needed. We are mentioning both only because you may encounter either one when working with numbers. In this book, we will refer to both numeric and integer vectors as “numeric”, for convenience.

2.3.2 The c function

A vector of length 1 can be created simply by typing a value, such as 600 or "Hello", as we have already seen (Section 1.3). A vector of length >1 can be created in several ways. The most straightforward method is to use the c function, which combines its inputs—vectors of length 1, or more—into a new vector, in the specified order. For example:

x = c(1, 2, 3)
x
## [1] 1 2 3

Note that the c function is not restricted to combining individual values (i.e., vectors of length 1). It can be used to combine any number of vectors, of any length. For example, the following expression combines four vectors—of length 3, 1, 3 and 2—into a new vector of length 9:

c(x, 84, x, c(-1, -2))
## [1]  1  2  3 84  1  2  3 -1 -2

Here is another example of using the c function, this time to combine four character values into a vector of length 4:

y = c("cat", "dog", "mouse", "apple")
y
## [1] "cat"   "dog"   "mouse" "apple"

2.3.3 Vector subsetting (individual elements)

We can access individual vector elements using the [ operator and a numeric index. That way, we can get a subset with an individual vector element:

y[1]
## [1] "cat"
y[2]
## [1] "dog"
y[3]
## [1] "mouse"
y[4]
## [1] "apple"

Note that numeric indices in R start at 1! This is unlike Python, C, JavaScript, and many other programming languages, where numeric indices start at 0.

Here is another example:

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts[4]
## [1] 1

Note the three components of an expression for accessing a vector element:

  • The vector being subsetted (counts[4])
  • Square brackets (counts[4])
  • The index (counts[4])

We can also make an assignment into a vector subset, for example to replace an individual element:

x = c(1, 2, 3)
x
## [1] 1 2 3
x[2] = 300
x
## [1]   1 300   3

In this example, we made an assignment into a subset with a single element. As we will see later on, we can assign values into a subset of any length, using the same method (Section 2.3.9).

2.3.4 Calling functions on a vector

There are numerous functions for calculating vector properties in R. The length, min, max, range, mean, and sum functions are most commonly used. Here is a demonstration of these functions:

x = c(1, 6, 3, -8, 2)
x
## [1]  1  6  3 -8  2
length(x)  # Number of elements
## [1] 5
min(x)     # Minimum
## [1] -8
max(x)     # Maximum
## [1] 6
range(x)   # Minimum and maximum
## [1] -8  6
mean(x)    # Average
## [1] 0.8
sum(x)     # Sum
## [1] 4

In what way is the range function different from the other functions shown above?

Contrariwise, there are functions that operate on each vector element, separately, returning a vector of results having the same length as the input:

abs(x)   # Absolute value
## [1] 1 6 3 8 2
sqrt(x)  # Square root
## Warning in sqrt(x): NaNs produced
## [1] 1.000000 2.449490 1.732051      NaN 1.414214

Why does the output of sqrt(x) contain NaN?

Note that the last expression produced a warning. A warning, in R, signals to the user that something suspicious or notable has happened. A warning differs from an error (Section 1.3.7) in that when a warning is raised, the expression is executed nevertheless. Contrariwise, when an error is raised, code evaluation stops and the expression is not executed by the interpreter.

2.3.5 The recycling rule (arithmetic)

Binary operations, such as arithmetic (Section 1.3.2) and conditional (Section 1.3.4) operators, when applied on two vectors, are done element-by-element. The result is, then, a vector of the respective results. For example, the following expression:

c(11, 2, 3) + c(10, 20, 30)
## [1] 21 22 33

is interpreted as:

c(11+10, 2+20, 3+30)
## [1] 21 22 33

Here are three more examples, demonstrating the element-by-element behavior using other operators, *, >, and <:

c(11, 2, 3) * c(10, 20, 30)
## [1] 110  40  90
c(11, 2, 3) > c(10, 20, 30)
## [1]  TRUE FALSE FALSE
c(11, 2, 3) < c(10, 20, 30)
## [1] FALSE  TRUE  TRUE

What happens when the input vector lengths do not match? In such case, the shorter vector gets “recycled”. For example, when one of the vectors is of length 3 and the other vector is of length 6, then the shorter vector (of length 3) is replicated two times, until it matches the longer vector (Figure 2.6). Thus, the expression:

c(1, 2, 3)          + c(1, 2, 3, 4, 5, 6)
## [1] 2 4 6 5 7 9

is equivalent to the expression:

c(1, 2, 3, 1, 2, 3) + c(1, 2, 3, 4, 5, 6)
## [1] 2 4 6 5 7 9
Vector recycling

Figure 2.6: Vector recycling

When one of the vectors is of length 1 and the other is of length 4, the shorter vector (of length 1) is replicated 4 times:

2             * c(1, 2, 3, 4)
## [1] 2 4 6 8
c(2, 2, 2, 2) * c(1, 2, 3, 4)
## [1] 2 4 6 8

When one of the vectors is of length 2 and the other is of length 6, the shorter vector (of length 2) is replicated 3 times:

c(10, 100)                   + c(1, 2, 3, 4, 5, 6)
## [1]  11 102  13 104  15 106
c(10, 100, 10, 100, 10, 100) + c(1, 2, 3, 4, 5, 6)
## [1]  11 102  13 104  15 106

What happens when the longer vector length is not a multiple of the shorter vector length? In such case, recycling is “incomplete”, as indicated by a warning message. In the following example, the shorter vector is recycled “1.5 times”:

c(1, 2)    * c(1, 2, 3)
## Warning in c(1, 2) * c(1, 2, 3): longer object length is not a multiple of
## shorter object length
## [1] 1 4 3
c(1, 2, 1) * c(1, 2, 3)
## [1] 1 4 3

Incomplete recycling is rarely something we want to do in practice. Therefore, the latter warning is usually an indication that something is wrong in our code.

2.3.6 Consecutive and repetitive vectors

2.3.6.1 Introduction

Other than the c function (Section 2.3.2), there are three commonly used methods for creating consecutive or repetitive vectors:

2.3.6.2 Consecutive vectors

The : operator is used to create a vector of consecutive values in steps of 1:

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

or in steps of -1:

55:43
##  [1] 55 54 53 52 51 50 49 48 47 46 45 44 43

The seq function provides a more general way to create a consecutive vector with any step size, not necessarily 1 or -1. The three most useful parameters of the seq function are:

  • from—Where to start
  • to—When to end
  • by—Step size

For example:

seq(from = 100, to = 150, by = 10)
## [1] 100 110 120 130 140 150
seq(from = 100, to = 80, by = -5)
## [1] 100  95  90  85  80

The above examples of the seq function are also examples of passing more than one argument in a function call (Section 1.3.6). You may also have noticed that the arguments are named. We are going to elaborate on the syntax rules of using more than one argument, in a function call in R, in a moment (Section 2.3.7).

2.3.6.3 Repetitive vectors

The rep function replicates its argument to create a repetitive vector:

  • x—What to replicate
  • times—How many times to replicate the entire vector
  • each—How many times to replicate each element

For example, using times we can replicate the entire vector the specified number of times:

rep(x = 22, times = 10)
##  [1] 22 22 22 22 22 22 22 22 22 22
rep(x = c(18, 0, 9), times = 4)
##  [1] 18  0  9 18  0  9 18  0  9 18  0  9

Alternatively, we can use each to replicate each element the specified number of times:

rep(x = c(18, 0, 9), each = 4)
##  [1] 18 18 18 18  0  0  0  0  9  9  9  9

2.3.7 Function calls

Using the seq function, we will demonstrate three properties of function calls. First, we can omit parameter names as long as the arguments are passed in the default order. For example, the following two function calls are identical, because the default order of the (first three) seq function parameters is from, to and by:

seq(from = 5, to = 10, by = 1)
## [1]  5  6  7  8  9 10
seq(5, 10, 1)
## [1]  5  6  7  8  9 10

Second, we can use any argument order as long as parameter names are specified. The following three function calls are identical, even though argument order is not the same, since the arguments are named:

seq(to = 10, by = 1, from = 5)
## [1]  5  6  7  8  9 10
seq(by = 1, from = 5, to = 10)
## [1]  5  6  7  8  9 10
seq(from = 5, by = 1, to = 10)
## [1]  5  6  7  8  9 10

Third, we can omit parameters that have a default argument, specified as part of the function definition. For example, the by parameter of seq has a default value of 1:

seq(5, 10, 1)
## [1]  5  6  7  8  9 10
seq(5, 10)
## [1]  5  6  7  8  9 10

The parameters of a particular function, their order, and their default values (if any), can be found in the help file of every function (Section 1.3.12):

?seq

2.3.8 Vector subsetting (general)

So far, we created vector subsets using a numeric index which consists of a single value (Section 2.3.3), as in:

x = c(43, 85, 10)
x[3]
## [1] 10

We can also use a vector of length >1 as an index. For example, the following expression returns the first and second elements of x, since the index is the vector c(1,2) (which we create using the : operator) (Section 2.3.6.2):

x[1:2]
## [1] 43 85

Note that the vector of indices can consist of any combination of indices whatsoever. It does not have to be consecutive, and it can even include repetitions:

x[c(1, 1, 3, 2)]
## [1] 43 43 10 85

Here is another example (Figure 2.7):

counts = c(2, 0, 3, 1, 3)
counts[1:3]
## [1] 2 0 3
Vector subsetting with a vector of indices (`1:3`)

Figure 2.7: Vector subsetting with a vector of indices (1:3)

And here is one more example where the index is not consecutive:

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts[c(1:3, 7:9)]
## [1] 2 0 3 9 0 2

Note, again, the components of the subsetting expression:

  • The vector being subsetted (counts[c(1:3, 7:9)])
  • Square brackets (counts[c(1:3, 7:9)])
  • The index (counts[c(1:3, 7:9)])

For the next examples, let’s create a vector of all even numbers between 1 and 100:

x = seq(2, 100, 2)
x
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100

Can you think of another way to create the above vector, using the : and * operators?

What is the meaning of the numbers in square brackets when printing the vector?

How can we check how many elements does x have? Recall the length function (Section 2.3.4):

length(x)
## [1] 50

Using this knowledge, here are two expression that return the value of the last element in x:

x[50]
## [1] 100
x[length(x)]
## [1] 100

Which of the last two expressions is preferable? Why?

Which index can we use to get back the entire vector? We can use an index that contains all of the vector element indices, starting from 1 and up to length(x):

x[1:length(x)]
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100

What index can we use to get the entire vector except for the last two elements?

What index can we use to get a reversed vector?

Note that there is a built-in function named rev for reversing a vector:

rev(x)
##  [1] 100  98  96  94  92  90  88  86  84  82  80  78  76  74  72  70  68  66  64
## [20]  62  60  58  56  54  52  50  48  46  44  42  40  38  36  34  32  30  28  26
## [39]  24  22  20  18  16  14  12  10   8   6   4   2

Note that, when requesting elements beyond the vector length, we get NA (Not Available) values. For example:

x[55]
## [1] NA
x[1:80]
##  [1]   2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38
## [20]  40  42  44  46  48  50  52  54  56  58  60  62  64  66  68  70  72  74  76
## [39]  78  80  82  84  86  88  90  92  94  96  98 100  NA  NA  NA  NA  NA  NA  NA
## [58]  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA
## [77]  NA  NA  NA  NA

2.3.9 The recycling rule (assignment)

Earlier, we saw how the recycling rule applies to arithmetic and conditional operators (Section 2.3.5). The rule also applies to assignment.

For example, here we assign a vector of length 1 (NA) into a subset of length 6 (c(1:3,7:9)). As a result, NA is replicated six times, to match the subset:

counts = c(2, 0, 3, 1, 3, 2, 9, 0, 2, 1, 11, 2)
counts
##  [1]  2  0  3  1  3  2  9  0  2  1 11  2
counts[c(1:3, 7:9)] = NA
counts
##  [1] NA NA NA  1  3  2 NA NA NA  1 11  2

Here, c(NA,99) is replicated three times, also to match the subset of length 6:

counts[c(1:3, 7:9)] = c(NA, 99)
counts
##  [1] NA 99 NA  1  3  2 99 NA 99  1 11  2

2.3.10 Logical vectors

2.3.10.1 Creating logical vectors

The third common type of vectors, in addition to numeric and character vectors, are logical vectors. A logical vector is composed of logical values (Section 1.3.4), TRUE and FALSE (or NA). For example:

c(TRUE, FALSE, FALSE)
## [1]  TRUE FALSE FALSE
rep(TRUE, 7)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Usually, we will not be creating logical vectors manually, but through applying a conditional operator (Section 1.3.4) on a numeric or character vector. For example:

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x >= 7
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Note how the recycling rule applies to conditional operators in the above expression.

An important property of logical vectors is, that, when arithmetic operations are applied, the logical vector is automatically converted to a numeric one, where TRUE becomes 1 and FALSE becomes 0. For example:

TRUE + FALSE
## [1] 1
sum(x >= 7)
## [1] 4
mean(x >= 7)
## [1] 0.4

What is the meaning of the values 4 and 0.4 in the above example?

2.3.10.2 Subsetting with logical vectors

So far, we used a numeric vector of indices when subsetting a vector (Sections 2.3.3 and 2.3.8). A logical vector can also be used as an index for subsetting. When using a logical vector of indices, the subset contains those elements which are in the same positions as the TRUE elements in the index. This means that the logical vector of indices needs to match the length of the vector being subsetted. If this is not the case, the logical vector is recycled.

For example:

counts = c(2, 0, 3, 1, 3)
counts[c(TRUE, FALSE, TRUE, FALSE, FALSE)]
## [1] 2 3

Here is another example, where the logical vector of indices is created from the same vector being subsetted:

counts[counts < 3]
## [1] 2 0 1

In this example, the logical vector counts<3:

counts < 3
## [1]  TRUE  TRUE FALSE  TRUE FALSE

specifies whether to include each of the elements of counts in the resulting subset (Figure 2.8).

What does the expression counts[counts<3] do, in plain language?

Subsetting with a logical vector

Figure 2.8: Subsetting with a logical vector

Here are some more examples of subsetting with a logical index:

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x[x >= 3]         # Elements of 'x' greater or equal than 3
## [1]  3  4  5  6  7  8  9 10
x[x != 2]         # Elements of 'x' not equal to 2 
## [1]  1  3  4  5  6  7  8  9 10
x[x > 4 | x < 2]  # Elements of 'x' greater than 4 or smaller than 2
## [1]  1  5  6  7  8  9 10
x[x > 4 & x < 2]  # Elements of 'x' greater than 4 and smaller than 2
## integer(0)

What does the output integer(0), which we got in the last expression, mean?

The next example is slightly more complex; we select the elements of z whose square is larger than 8:

z = c(5, 2, -3, 8)
z[z^2 > 8]
## [1]  5 -3  8

Let’s go over this step-by-step. First, z^2 gives a vector of squared z values (2 is recycled):

z^2
## [1] 25  4  9 64

Then, each of the squares is compared to 8 (8 is recycled):

z^2 > 8
## [1]  TRUE FALSE  TRUE  TRUE

Finally, the logical vector z^2>8 is used for subsetting z.

2.3.11 Missing values

The is.na function is used to detect missing (NA) values (Section 1.3.5) in a vector. The is.na function:

  • accepts a vector, of any type, and
  • returns a logical vector, with TRUE in place of NA values and FALSE in place of non-NA values.

For example, suppose we have a vector x where some of the values are missing:

x = c(28, 58, NA, 31, 39, NA, 9)
x
## [1] 28 58 NA 31 39 NA  9

The is.na function can be used to detect which values are missing:

is.na(x)
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE

How can we use the above expression to subset the non-missing values of x?

A common mistake is to use comparison with NA to detect missing values, instead of is.na, as in:

x == NA
## [1] NA NA NA NA NA NA NA

Can you explain why the above doesn’t work “as intended”?

Many of the functions that summarize vector properties (Section 2.3.4), such as sum and mean, have a parameter called na.rm. The na.rm parameter is used to determine whether NA values are excluded from the calculation. The default is na.rm=FALSE, meaning that NA values are not excluded. For example:

x = c(28, 58, NA, 31, 39, NA, 9)
mean(x)                # Mean including NA values
## [1] NA
mean(x, na.rm = TRUE)  # Mean excluding NA values
## [1] 33

Why do we get NA in the first expression?

What do you think will be the result of length(x): NA or 7? Execute the expression to check your answer.

How can we replace the NA values in x with the mean of its non-NA values?

2.4 Additional useful functions

2.4.1 any and all

Sometimes we want to figure out whether a logical vector:

  • contains at least one TRUE value; or
  • is entirely composed of TRUE values.

We can use the any and all functions, respectively, to do those things.

The any function returns TRUE if at least one of the input vector values is TRUE, otherwise it returns FALSE. For example, let’s take a numeric vector x:

x = c(2, 6, 2, 3, 0, 1, 6)
x
## [1] 2 6 2 3 0 1 6

The expression any(x > 5) returns TRUE, which means that the vector x > 5 contains at least one TRUE value, i.e., at least one element of x is greater than 5:

x > 5
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
any(x > 5)
## [1] TRUE

The expression any(x > 88) returns FALSE, which means that the vector x > 88 contains no TRUE values, i.e., none of the elements of x are greater than 88:

x > 88
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
any(x > 88)
## [1] FALSE

The all function returns TRUE if all of the input vector values are TRUE, otherwise it returns FALSE. For example, the expression all(x > 5) returns FALSE, which means that the vector x > 5 contains at least one FALSE value, i.e., not all elements of x are greater than 5:

x > 5
## [1] FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
all(x > 5)
## [1] FALSE

The expression all(x > -1) returns TRUE, which means that x > -1 is composed entirely of TRUE values, i.e., all elements of x are greater than -1:

x > -1
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
all(x > -1)
## [1] TRUE

In a way, any and all are inverse:

  • any determines if the logical vector contains at least one TRUE value.
  • all determines if the logical vector contains at least one FALSE value.

Which expression can we use to get TRUE if vector x contains at least one NA value, or FALSE if it does not?

2.4.2 which

The which function converts a logical vector to a numeric one, with the indices of TRUE values. That way, we can find out the index of values that satisfy a given condition. For example, considering the vector x:

x
## [1] 2 6 2 3 0 1 6

the expression which(x > 2.3) returns the indices of TRUE elements in x > 2.3, i.e., the indices of x elements which are greater than 2.3:

x > 2.3
## [1] FALSE  TRUE FALSE  TRUE FALSE FALSE  TRUE
which(x > 2.3)
## [1] 2 4 7

2.4.3 which.min and which.max

Related to which (Section 2.4.2) are functions which.min and which.max. The latter two functions return the index of the (first!) minimal or maximal value in a vector, respectively. For example, considering the vector x:

x
## [1] 2 6 2 3 0 1 6

using which.min we can find out that the minimal value of x is in the 5th position:

which.min(x)
## [1] 5

while using which.max we can find out that the maximal value of x is in the 2nd position:

which.max(x)
## [1] 2

What expression can we use to find all indices (2, 7) of the maximal value in x?

2.4.4 The order function

The order function returns ordered vector indices, based on the order of vector values. In other words, order gives the index of the smallest value, the index of the second smallest value, etc., up to the index of the largest value. For example, given the vector x:

x
## [1] 2 6 2 3 0 1 6

order(x) returns the indices 1:length(x), ordered from smallest to largest value:

order(x)
## [1] 5 6 1 3 4 2 7

This result tells us that the 5th element of x is the smallest, the 6th is the second smallest, and so on.

We can also get the reverse order with decreasing=TRUE:

order(x, decreasing = TRUE)
## [1] 2 7 4 1 3 6 5

How can we get a sorted vector of elements from x, as shown below, using the order function?

## [1] 0 1 2 2 3 6 6

2.4.5 paste and paste0

The paste function is used to “paste” text values. Its sep parameter determines the separating character(s), with the default being sep=" " (a space). For example:

paste("There are", "5", "books.")
## [1] "There are 5 books."
paste("There are", "5", "books.", sep = "_")
## [1] "There are_5_books."

Non-character vectors are automatically converted to character before pasting:

paste("There are", 80, "books.")
## [1] "There are 80 books."

The recycling rule applies in paste too:

paste("image", 1:5, ".tif", sep = "")
## [1] "image1.tif" "image2.tif" "image3.tif" "image4.tif" "image5.tif"

Finally, the paste0 function is a shortcut for paste with sep="":

paste0("image", 1:5, ".tif")
## [1] "image1.tif" "image2.tif" "image3.tif" "image4.tif" "image5.tif"