Chapter 5 Matrices and rasters
Last updated: 2021-03-31 00:23:39
Aims
Our aims in this chapter are:
- Start working with spatial data (rasters)
- Install and use packages beyond “base R”
- Introduce the basic
matrix
andarray
data structures, and their analogous spatial data structure (classstars
) for single band and multi-band rasters, respectively - Learn to access the cell values and other properties of rasters
- Learn to read and write raster data
We will use the following R packages:
stars
mapview
cubeview
5.1 Matrices
5.1.1 What is a matrix
?
A matrix
is a two-dimensional (like a data.frame
) collection of values of the same type (like a vector, and unlike a data.frame
). The number of values in all columns of a matrix is equal, and the same can be said about the rows. It is important to know how to work with matrices because it is a commonly used data structure, with many uses in data processing and analysis, including spatial data. For example, many R function accept a matrix
as an argument, or return a matrix
as a returned object, e.g., st_intersects
(Section 8.3.3). Moreover, a matrix
is used to store single-band raster values in a stars
raster object, which we learn about later on in this chapter (Section 5.3.8.2).
5.1.2 Creating a matrix
A matrix
can be created with the matrix
function. The matrix
function accepts the following arguments:
data
—A vector of the values to fill into the matrixnrow
—The number of rowsncol
—The number of columnsbyrow
—Whether the matrix is filled by column (FALSE
, the default) or by row (TRUE
)
For example:
= matrix(1:6, nrow = 2, ncol = 3)
x
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
Note that16 the class of matrix
objects is a vector of length two, with the values "matrix"
and "array"
:
class(x)
## [1] "matrix" "array"
This implies the fact that the matrix
class inherits (Section 1.1.6) from the more general array
class (Section 5.2).
The nrow
and ncol
parameters determine the number of rows and number of columns, respectively. When only one of them is specified, the other is automatically determined based on the length of the data
vector:
matrix(1:6, nrow = 2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
matrix(1:6, ncol = 2)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Create a
matrix
with 3 rows and 4 columns which contains the numbers 12-1 in decreasing order.
What do you think will happen when we try to create a matrix with less, or more,
data
values than matrix sizenrow*ncol
? Run the following expressions to find out.
matrix(12:1, ncol = 4, nrow = 2)
matrix(12:1, ncol = 4, nrow = 4)
Create a \(3\times3\) matrix where all values are \(1/9\).
Finally, the byrow
parameter determines the direction of filling the matrix with data
values. In both cases the filling starts from the top-left corner (i.e., row 1, column 1), however with byrow=FALSE
the matrix is filled one column at a time (the default), while with byrow=TRUE
the matrix is filled one row at a time. For example:
matrix(1:12, nrow = 3) ## the default, byrow=FALSE
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
matrix(1:12, nrow = 3, byrow = TRUE)
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
5.1.3 matrix
properties
5.1.3.1 Dimensions
To demonstrate several functions for working with matrices, let’s define a matrix
named x
as shown above (Section 5.1.2):
= matrix(1:6, nrow = 2)
x
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
The length
function returns the number of values in a matrix
:
length(x)
## [1] 6
Just like with a data.frame
(Section 4.1.4.1), the nrow
and ncol
functions return the number of rows and columns in a matrix
, respectively:
nrow(x)
## [1] 2
ncol(x)
## [1] 3
Also like with a data.frame
, the dim
function gives both dimensions of the matrix
as a vector of length 2, i.e., number of rows and columns, respectively:
dim(x)
## [1] 2 3
For example, R has a built-in dataset named volcano
, which is a matrix
of surface elevation. The sample script volcano.R
, used in Section 2.1.1 to demontrate working with R code files, creates a 3D image of elevation based on that matrix (Figure 2.2).
Find out what are the number of elements, rows and columns in the built-in
matrix
namedvolcano
.
5.1.3.2 Row and column names
Like a data.frame
(Section 4.1.4.2), matrix
objects also have row and column names which can be accessed or modified using the rownames
and colnames
functions, respectively. Unlike data.frame
row and column names, which are mandatory, matrix
row and column names are optional. For example, matrices created with matrix
initially do not have row and column names:
rownames(x)
## NULL
colnames(x)
## NULL
The matrix
row and column names can be initialized, or modified, by assignment to the rownames
and colnames
properties:
rownames(x) = c("a", "b")
colnames(x) = c("var1", "var2", "var3")
x## var1 var2 var3
## a 1 3 5
## b 2 4 6
5.1.4 matrix
conversions
5.1.4.1 matrix
→ vector
The as.vector
function converts a matrix
to a vector:
= matrix(1:6, ncol = 3, byrow = TRUE)
x
x## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
as.vector(x)
## [1] 1 4 2 5 3 6
Note that the matrix
values are always arranged by column in the resulting vector!
Does the
volcano
matrix contain anyNA
values? How can we check?
5.1.4.2 matrix
→ data.frame
The as.data.frame
function converts a matrix
to a data.frame
:
as.data.frame(x)
## V1 V2 V3
## 1 1 2 3
## 2 4 5 6
Note that row and column names are automatically generated (if they do not exist) as part of the conversion, since they are mandatory in a data.frame
(Section 5.1.3.2).
5.1.5 Transposing a matrix
The t
function transposes a matrix
. In other words, the matrix rows and columns are “switched”—rows become columns and columns become rows:
= matrix(1:6, ncol = 3)
x
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
t(x)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
What will be the result of
t(t(x))
?
5.1.6 Image with contours
Using the image
and contour
functions we can graphically display matrix
values in the form of a heatmap—where values are encoded to colors—and a contour layer, respectively.
The color scale in image
can be set with col
. R has several built-in color scale functions—such as terrain.colors
and heat.colors
—that return a specified number of color codes from the given color scale. The hcl.colors
function offers a wide range of color scales useful for mapping, including the well-known ColorBrewer scales. The hcl.colors
function requires the number of colors and the name of the color scale. For example, the following expression returns 11
colors from the "Spectral"
color scale:
hcl.colors(11, "Spectral")
## [1] "#A71B4B" "#D44D35" "#ED820A" "#F7B347" "#FCDE85" "#FEFDBE" "#BAEEAE"
## [8] "#61D4AF" "#00B1B5" "#0084B3" "#584B9F"
Try replacing
terrain.colors(30)
withheat.colors(30)
, or withhcl.colors(11, "Spectral")
, to see an image of thevolcano
matrix with other color scales. You can runhcl.pals()
to get a list of possible scale types passed tohcl.colors
(other than"Spectal"
) and try some of those too.
The x/y aspect ratio in either image
or contour
can be set with asp
.
Finally, add=TRUE
can be used so that the contour
layer is added on top of an existing plot, such as the one created with image
, rather than initiated in a new plot.
Combining all of the above, the following expressions display a contour layer on top of a colored image of the volcano
matrix. The resulting image is shown in Figure 5.1.
image(volcano, col = terrain.colors(30), asp = ncol(volcano) / nrow(volcano))
contour(volcano, add = TRUE)
Note that the argument asp=ncol(volcano)/nrow(volcano)
makes sure the ratio between the x and y axes is 1:1, i.e., that the image cells are rectangular17.
5.1.7 Matrix subsetting
5.1.7.1 Individual rows and columns
Similarly to what we learned about data.frame
(Section 4.1.5), matrix
indices are two-dimensional. The first value refers to rows and the second value refers to columns. For example:
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
2, 1] # Row 2, column 1
x[## [1] 2
1, 3] # Row 1, column 3
x[## [1] 5
The following examples subset the volcano
matrix:
20, 40] # Row 20, column 40
volcano[## [1] 190
81, 61] # Row 81, column 61
volcano[## [1] 95
How can we find out the top-right corner value of
volcano
?
Complete rows or columns can be accessed by leaving a blank space instead of the row or column index. By default, a subset that comes from a single row or a single column is simplified to a vector:
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
2, ] # Row 2
x[## [1] 2 4 6
2] # Column 2
x[, ## [1] 3 4
To “suppress” the simplification of individual rows/columns to a vector, we can use the drop=FALSE
argument (Section 4.1.5.3):
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
2, , drop = FALSE] # Row 2
x[## [,1] [,2] [,3]
## [1,] 2 4 6
2, drop = FALSE] # Column 2
x[, ## [,1]
## [1,] 3
## [2,] 4
When referring to an elevation matrix, such as volcano
, any given row or column subset is actually an elevation profile. For example, the following expressions extract two elevation profiles from volcano
:
= volcano[30, ] # Row 30
r30 = volcano[70, ] # Row 70 r70
Figure 5.2 graphically displays those profiles (Section 3.2.2):
plot(r30, type = "o", col = "blue", ylim = range(c(r30, r70)), ylab = "Elevation (m)")
lines(r70, type = "o", col = "red")
Figure 5.3 shows the location of the two profiles in a 3D image of volcano
.
5.1.7.2 Subsets with >1 row or column
We can also use vectors of length >1 when subsetting:
x## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
1:2]
x[, ## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
2, c(1,3)]
x[## [1] 2 6
2, c(1,3), drop = FALSE]
x[## [,1] [,2]
## [1,] 2 6
We can assign new values to subsets of a matrix
:
= matrix(NA, ncol = 3, nrow = 3)
m
m## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
## [3,] NA NA NA
2:3, 1:2] = 1
m[
m## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] 1 1 NA
## [3,] 1 1 NA
1:2, 2:3] = 100
m[
m## [,1] [,2] [,3]
## [1,] NA 100 100
## [2,] 1 100 100
## [3,] 1 1 NA
5.1.8 Summarizing rows and columns
How can we calculate the row or column means of a matrix? One way is to use a for
loop (Section 4.2.3), as follows:
= rep(NA, nrow(volcano))
result for(i in 1:nrow(volcano)) {
= mean(volcano[i, ])
result[i] }
The resulting vector of row means can be visualized as follows (Figure 5.4):
plot(result, type = "b", xlab = "Row", ylab = "Elevation (m)")
What changes do we need to make in the
for
loop code to calculate column means?
We can use the apply
function (Section 4.5) to do the same, using much shorter code:
= apply(volcano, 1, mean) result
What changes do we need to make in the
apply
expression to calculate column means?
Moreover, for the special case of mean
there are further shortcuts, named rowMeans
and colMeans
18:
= rowMeans(volcano) result
Note: in both cases we can use na.rm
to determine whether NA
values are included in the calculation (default is FALSE
).
How can we check whether the above two expressions give exactly the same result?
5.2 Arrays
5.2.1 Creating an array
An array is a data structure that contains values of the same type and can have any number of dimensions. We may therefore consider a vector (1 dimension) and a matrix
(2 dimensions) as special cases of an array. In this scope of this book, we will be dealing only with three-dimensional arrays, which—as we will see later on (Section 5.3.8.2)—are used to store multi-band raster values in stars
raster objects.
An array
can be created with the array
function, specifying the values (data
) and the required dimensions (dim
). Note that creating arrays, with the array
function, is very similar to the way we create matrices, with the matrix
function—just more general. For example, the following expression creates an array with the values 1:24
and three dimensions—2
rows, 3
columns and 4
“layers” (Figure 5.5):
= array(data = 1:24, dim = c(2, 3, 4))
y
y## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
##
## , , 4
##
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
5.2.2 Array subsetting
Subsetting an array
works similarly to matrix
subsetting (Section 5.1.7), only that we can have any number of indices—corresponding to the number of array
dimensions—rather than two. Accordingly, when subsetting a three-dimensional array we need to provide three indices. For example, here is how we can extract a particular row, column or “layer” from a three-dimensional array (Figure 5.6):
1, , , drop = FALSE] # Row 1
y[## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 7 9 11
##
## , , 3
##
## [,1] [,2] [,3]
## [1,] 13 15 17
##
## , , 4
##
## [,1] [,2] [,3]
## [1,] 19 21 23
1, , drop = FALSE] # Column 1
y[, ## , , 1
##
## [,1]
## [1,] 1
## [2,] 2
##
## , , 2
##
## [,1]
## [1,] 7
## [2,] 8
##
## , , 3
##
## [,1]
## [1,] 13
## [2,] 14
##
## , , 4
##
## [,1]
## [1,] 19
## [2,] 20
1, drop = FALSE] # "Layer" 1
y[, , ## , , 1
##
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
We can also subset two or three dimensions at a time, to get an individual row/column, row/layer, column/layer or row/column/layer combination (Figure 5.7):
1, 1, ] # Row 1, column 1
y[## [1] 1 7 13 19
1, , 1] # Row 1, "layer" 1
y[## [1] 1 3 5
1, 1] # Column 1, "layer" 1
y[, ## [1] 1 2
1, 1, 1] # Row 1, column 1, "layer" 1
y[## [1] 1
5.2.3 Using apply
with arrays
When using apply
on a 3-dimensional array
, we can apply a function:
- On one of the dimensions: rows (
1
), columns (2
), or “layers” (3
) - On a combinations of any two dimensions: row/column combinations (
1:2
), row/“layer” combinations (c(1,3)
), or column/“layer” combinations (2:3
)
Here are the four most useful dimension combinations with respect to arrays representing spatial data (Section 5.3):
apply(y, 1, mean) # Row means
## [1] 12 13
apply(y, 2, mean) # Column means
## [1] 10.5 12.5 14.5
apply(y, 3, mean) # "Layer" means
## [1] 3.5 9.5 15.5 21.5
apply(y, 1:2, mean) # Row/Column combination ("pixel") means
## [,1] [,2] [,3]
## [1,] 10 12 14
## [2,] 11 13 15
5.2.4 Basic data structures in R
So far we met four out of five of the basic data structures in R (Table 5.1), so it is time for a short summary. We can classify the basic data structures in R based on:
- Number of dimensions—One-dimensional, two-dimensional or n-demensional19
- Homogeneity—homogeneous (values of the same type) or heterogeneous (values of different types)
Number of dimensions | Homogeneous | Heterogeneous |
---|---|---|
one-dimensional | vector (Chapter 2) | list (Chapter 11) |
two-dimensional | matrix (Chapter 5) |
data.frame (Chapter 4) |
n-dimensional | array (Chapter 5) |
Most of the data structures in R are combinations of the basic five ones.
5.3 Rasters
5.3.1 What is a raster?
A raster (Figure 5.8) is basically a matrix or an array, representing a rectangular area on the surface of the earth. To associate the matrix or the array with the particular area it represents, the raster has some additional spatial properties, on top of the non-spatial properties that any ordinary matrix or array has:
- Non-spatial properties
- Values
- Dimensions (rows, columns, layers)
- Spatial properties
- Extent
- Coordinate Reference System (CRS)
- (Resolution)
Raster extent is the range of x- and y-axis coordinates that the raster occupies (Figure 5.9). The Coordinate Reference System (CRS) is the particular system that “associates” the raster coordinates (which are just pairs of x/y values) to geographic locations. Raster resolution is the size of a raster cell, in the x and y directions. The resolution is listed in parentheses because it can be calculated given the extent and the number of rows and columns. For example, the x-axis resolution is equal to the x-axis range difference (i.e., \(x_{max} - x_{min}\)) divided by the number of columns. In the leftmost panel in Figure 5.9, the the x-axis and y-axis resolutions are both equal to \(\frac{8-0}{12}=\frac{8}{12}\approx 0.67\).
5.3.2 Raster file formats
Commonly used raster file formats (Table 5.2) can be divided in two groups. “Simple” raster file formats, such as GeoTIFF, are single-band or multi-band rasters (Figure 5.10) with a geo-referenced extent, as discussed above (Section 5.3.1). “Complex” raster file formats, such as HDF, contain additional complexity, such as more than three dimensions (Figure 5.11), and/or metadata, such as band names, time stamps, units of measurement, and so on.
Type | Format | File extension |
---|---|---|
“Simple” | GeoTIFF | .tif |
Erdas Imagine Image | .img |
|
“Complex” (>3D and/or metadata) | HDF | .hdf , he5 , and other |
NetCDF | .nc |
5.3.3 Using R packages
Functions for working with rasters are not included in the “base” R installation, but in external packages, such as the stars
package, which we are going to learn about in a moment (Section 5.3.5). An R package is a collection of code files used to load objects—mostly functions—into memory. All object definitions in R are contained in packages. To use a particular object, we need to:
- Install the package with the
install.packages
function (once, for every installation of R20) - Load the package using the
library
function (every time we start a new R session)
Loading a package with library
basically means that all of its code files are executed, loading all objects the package defined into the RAM. However, all function calls we used until now did require installing or loading a package. Why is that? Because several packages are installed along with R and loaded on R start-up. There are several more packages which are installed by default but not loaded on start-up (total of ~30) (Table 5.3).
Package | Version | Priority | Loaded |
---|---|---|---|
base |
4.0.3 | base | Yes |
datasets |
4.0.3 | base | Yes |
graphics |
4.0.3 | base | Yes |
grDevices |
4.0.3 | base | Yes |
methods |
4.0.3 | base | Yes |
stats |
4.0.3 | base | Yes |
utils |
4.0.3 | base | Yes |
compiler |
4.0.3 | base | No |
grid |
4.0.3 | base | No |
parallel |
4.0.3 | base | No |
splines |
4.0.3 | base | No |
stats4 |
4.0.3 | base | No |
tcltk |
4.0.3 | base | No |
tools |
4.0.3 | base | No |
boot |
1.3-25 | recommended | No |
class |
7.3-17 | recommended | No |
cluster |
2.1.0 | recommended | No |
codetools |
0.2-18 | recommended | No |
foreign |
0.8-79 | recommended | No |
KernSmooth |
2.23-18 | recommended | No |
lattice |
0.20-41 | recommended | No |
MASS |
7.3-53 | recommended | No |
Matrix |
1.2-18 | recommended | No |
mgcv |
1.8-33 | recommended | No |
nlme |
3.1-150 | recommended | No |
nnet |
7.3-14 | recommended | No |
rpart |
4.1-15 | recommended | No |
spatial |
7.3-11 | recommended | No |
survival |
3.2-7 | recommended | No |
Most of the ~16,000 R packages (as of October 2020) are not installed by default. To use one of these packages we first need to install it on the computer. Installing a package is a one-time operation using the install.packages
function. After the package is installed, each time we want to use it we need to load it using library
.
In the following examples, we are going use a package called stars
to work with rasters. The stars
package is not installed with R, therefore we need to install it ourselves with:
install.packages("stars")
If the package is already installed, running install.packages
overwrites the old installation. This is done intentionally if you you want to install a newer version of the package. Once the package is already installed, we need to use the library
function to load it into memory. Note how package name can be passed to library
without quotes:
library(stars)
Other than stars
, in this book we are going to use the mapview
, cubeview
, sf
, units
, data.table
, dplyr
, gstat
and automap
packages. You can install all of them using install.packages
.
5.3.4 The raster
package
Before moving on to stars
package, the raster
package deserves to be mentioned. The raster
package is a powerful and well-established (2010-) package for working with rasters in R (Table 0.1). This package contains several classes and functions for working with rasters in R. The most important three classes from raster
are the ones for representing rasters:
RasterLayer
for single-band rastersRasterStack
andRasterBrick
for multi-band rasters21
The raster
package has several limitations, most notably that it is limited to three dimensions and cannot hold raster metadata, such as layer names and measurement units. Moreover, it is not well-integrated with the sf
package—which is the leading package for working with vector layers in R (Sections 7–8). For those reasons, in this book we will be working with the newer stars
package for rasters (Section 5.3.5).
5.3.5 The stars
package
The stars
package is a newer (2018-) R package for working with rasters (Table 0.1). The stars
package is more general, flexible, and tighly integrated with vector layer analysis in the sf
package, compared to raster
(Section 5.3.4).
The stars
package contains the stars
class for representing all types of rasters, and numerous functions for working with rasters in R. A stars
object is basically a list of matrices or arrays, along with metadata describing their dimensions. Don’t worry if this is not clear; we will elaborate on this later on (Section 6.3).
Compared to the three raster
classes (Section 5.3.4), the stars
class is more flexible and can represent more complex types of rasters:
- Rasters with more than three dimensions
- Metadata (band names, units of measurement, etc.)
- Non-standard grids (rotated, etc.)
5.3.6 Reading raster from file
The most common and most useful method for creating a raster object in R (and elsewhere) is reading from a file, such as a GeoTIFF file. We can use the read_stars
function to read a GeoTIFF file and create a stars
object in R. The first parameter (.x
) is the file path to the file we want to read, or just the file name—when the file is located in the working directory.
As an example, let’s read the dem.tif
raster file, which contains a coarse Digital Elevation Model (DEM) of the area around Haifa. First, we have to load the stars
package:
library(stars)
Then, we can use the read_stars
function to read the file22:
= read_stars("dem.tif") s
The file MOD13A3_2000_2019.tif
is a multi-band raster with monthly Normalized Difference Vegetation Index (NDVI) values in Israel, for the period between February 2000 and June 2019 from the MODIS instrument on the Terra satellite. This is a multi-band raster with 233 bands, where each band corresponds to an average monthly NDVI image (233 months total). Uncertain values were replaced with NA
. Let’s try reading this file as well, to create another stars
object in the R environment:
= read_stars("MOD13A3_2000_2019.tif") r
Two stars
objects, named r
and s
, are now available in our R session.
5.3.7 Visualization with plot
, mapview
and cubeview
5.3.7.1 Raster images with plot
The simplest way to visualize a stars
object is to use the plot
function. This produces a static image of the raster, such as the ones shown in Figures 5.12 and 5.13:
plot(s)
plot(r)
Useful additional parameters when running plot
on stars
objects include:
text_values
—Logical, whether to display text labelsaxes
—Logical, whether to display axescol
—A vector of color codes or names
For example (Figure 5.14):
plot(s, text_values = TRUE, axes = TRUE, col = terrain.colors(10))
The expression terrain.colors(10)
uses one of the built-in color palette functions in R to generate a vector of length 10 with terrain color codes, as discussed above (Section 5.1.6). The default color breaks are calculated using quantiles (breaks="quantile"
). We can use other break types (such as breaks="equal"
) or pass our own vector of custom breaks (Figure 5.15):
plot(s, text_values = TRUE, col = terrain.colors(10), breaks = "equal")
plot(s, text_values = TRUE, col = terrain.colors(3), breaks = c(0, 100, 300, 500))
Note that the number of colors in the second expression (3) matches the number of breaks minus 1 (why?).
5.3.7.2 Interactive maps with mapview
The mapview
function from package mapview
lets us visually examine spatial objects—vector layers or rasters—in an interactive map on top of various background layers, such as OpenStreetMap, satellite images, etc. We can use the mapview
function, after loading the mapview
package, as follows. In this case, we are displaying the second layer of r
(we learn about layer subsetting in Section 6.2.1):
library(mapview)
mapview(r[,,,2])
5.3.7.3 Interactive cubes with cubeview
Another useful package for examining rasters, specifically multi-band ones, is cubeview
(Figure 5.16).
library(cubeview)
cubeview(r)
Try pressing the arrow keys, Page Up, and Page Down to scroll through the raster row, columns, and layers.
5.3.8 Raster values and properties
5.3.8.1 Class and structure
The print
method for raster objects gives a summary of their properties:
s## stars object with 2 dimensions and 1 attribute
## attribute(s):
## dem.tif
## Min. : 3.0
## 1st Qu.: 7.0
## Median : 39.0
## Mean :110.5
## 3rd Qu.:179.0
## Max. :448.0
## NA's :17
## dimension(s):
## from to offset delta refsys point values x/y
## x 1 7 679624 2880 UTM Zone 36, Northern Hem... FALSE NULL [x]
## y 1 10 3644759 -2880 UTM Zone 36, Northern Hem... FALSE NULL [y]
The class
function returns the class name, which is stars
in this case:
class(s)
## [1] "stars"
As discussed in Section 1.1.5, a class is a “template” with pre-defined properties that each object of that class has. For example, a stars
object is actially a collection (namely, a list
) of matrix
or array
objects, along with additional properties of the dimensions, such as dimension names, Coordinate Reference Systems (CRS), etc. When we read the dem.tif
file with read_stars
, the information regarding all of the properties was transferred from the file and into the stars
“template.” Now, the stars
object named s
is in the RAM, filled with the specific values from dem.tif
.
We can display the structure of the stars
object with the specific values with str
:
str(s)
## List of 1
## $ dem.tif: num [1:7, 1:10] NA NA NA NA NA 3 3 NA NA NA ...
## - attr(*, "dimensions")=List of 2
## ..$ x:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 7
## .. ..$ offset: num 679624
## .. ..$ delta : num 2880
## .. ..$ refsys:List of 2
## .. .. ..$ input: chr "UTM Zone 36, Northern Hemisphere"
## .. .. ..$ wkt : chr "PROJCRS[\"UTM Zone 36, Northern Hemisphere\",\n BASEGEOGCRS[\"WGS 84\",\n DATUM[\"unknown\",\n "| __truncated__
## .. .. ..- attr(*, "class")= chr "crs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..$ y:List of 7
## .. ..$ from : num 1
## .. ..$ to : num 10
## .. ..$ offset: num 3644759
## .. ..$ delta : num -2880
## .. ..$ refsys:List of 2
## .. .. ..$ input: chr "UTM Zone 36, Northern Hemisphere"
## .. .. ..$ wkt : chr "PROJCRS[\"UTM Zone 36, Northern Hemisphere\",\n BASEGEOGCRS[\"WGS 84\",\n DATUM[\"unknown\",\n "| __truncated__
## .. .. ..- attr(*, "class")= chr "crs"
## .. ..$ point : logi FALSE
## .. ..$ values: NULL
## .. ..- attr(*, "class")= chr "dimension"
## ..- attr(*, "raster")=List of 3
## .. ..$ affine : num [1:2] 0 0
## .. ..$ dimensions : chr [1:2] "x" "y"
## .. ..$ curvilinear: logi FALSE
## .. ..- attr(*, "class")= chr "stars_raster"
## ..- attr(*, "class")= chr "dimensions"
## - attr(*, "class")= chr "stars"
The unclass
function removes the class definition from an R object. This is another convenient way to demonstrate the fact that a stars
object s
is composed of a numeric matrix
along with the spatial properties of the x and y dimensions:
unclass(s)
## $dem.tif
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA 3 5 7
## [2,] NA NA NA 61 106 47 31 39 32 49
## [3,] NA NA NA 9 132 254 233 253 199 179
## [4,] NA NA NA 4 11 146 340 383 357 307
## [5,] NA 4 6 9 6 6 163 448 414 403
## [6,] 3 6 9 10 6 6 13 152 360 370
## [7,] 3 4 7 16 27 12 64 39 48 55
##
## attr(,"dimensions")
## from to offset delta refsys point values x/y
## x 1 7 679624 2880 UTM Zone 36, Northern Hem... FALSE NULL [x]
## y 1 10 3644759 -2880 UTM Zone 36, Northern Hem... FALSE NULL [y]
5.3.8.2 Raster attributes and values
stars
objects are collections of matrices or arrays. Each matrix or array is known as an attribute and is associated with a name. A GeoTIFF file, being a simple raster format (Section 5.3.2), can contain just one attribute. Attribute names are not specified as part of a GeoTIFF file, and therefore automatically given default values based on the file name. We can get the attribute name(s) using the names
function:
names(s)
## [1] "dem.tif"
names(r)
## [1] "MOD13A3_2000_2019.tif"
We can change the attribute names through assignment. For example, it makes sense to name the attribute after the physical property or measurement it represents:
names(s) = "elevation"
names(s)
## [1] "elevation"
names(r) = "NDVI"
names(r)
## [1] "NDVI"
Accessing an attrubute, by name or by numeric index, returns the matrix
(single-band raster) or array
(multi-band raster) object with the values of that attribute. For example, in the following expressions we access the (only) attribute of the s
raster by name:
$elevation
s## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA 3 5 7
## [2,] NA NA NA 61 106 47 31 39 32 49
## [3,] NA NA NA 9 132 254 233 253 199 179
## [4,] NA NA NA 4 11 146 340 383 357 307
## [5,] NA 4 6 9 6 6 163 448 414 403
## [6,] 3 6 9 10 6 6 13 152 360 370
## [7,] 3 4 7 16 27 12 64 39 48 55
class(s$elevation)
## [1] "matrix" "array"
dim(s$elevation)
## x y
## 7 10
and here we do the same for r
:
class(r$NDVI)
## [1] "array"
dim(r$NDVI)
## x y band
## 167 485 233
and here we do the same using a numeric index and the [[
operator:
class(s[[1]])
## [1] "matrix" "array"
dim(s[[1]])
## x y
## 7 10
class(r[[1]])
## [1] "array"
dim(r[[1]])
## x y band
## 167 485 233
The $
and [[
operators actually select individual elements from a list
; this works because a stars
object is, internally, a list
of matrices or arrays (i.e., the “attributes”). We will elaborate on lists and the [[
operator in Section 11.1.
By now we met all three subset operators ([
, $
, and [[
) which we are going to use in this book (Table 5.4).
Syntax | Objects | Returns |
---|---|---|
x[i] |
vector, list |
Subset i |
x[i, j] |
data.frame , matrix |
Subset i,j |
x[i, j, k] |
array |
Subset i,j,k |
x[[i]] |
vector, list |
Single element i |
x$i |
data.frame , list |
Single column or element i |
x@n |
S4 objects | Slot n |
5.3.8.3 Dimensions and spatial properties
The nrow
, ncol
and dim
functions return the number or rows, column or all available dimensions of a stars
object, respectively. These functions return a named numeric
vector, where the names correspond to dimension names (Section 6.3.2). For example:
nrow(s)
## x
## 7
ncol(s)
## y
## 10
dim(s)
## x y
## 7 10
nrow(r)
## x
## 167
ncol(r)
## y
## 485
dim(r)
## x y band
## 167 485 233
As mentioned above (Section 5.3.1), the spatial properties, determining raster placement in geographical space, include the extent, the Coordinate Reference System (CRS) and the resolution.
For example, the CRS definition of a stars
object can be accessed with the st_crs
function. Here are the CRS definitions of rasters s
and r
:
st_crs(s)
## Coordinate Reference System:
## User input: UTM Zone 36, Northern Hemisphere
## wkt:
## PROJCRS["UTM Zone 36, Northern Hemisphere",
## BASEGEOGCRS["WGS 84",
## DATUM["unknown",
## ELLIPSOID["WGS84",6378137,298.257223563,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]]],
## CONVERSION["Transverse Mercator",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",33,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
st_crs(r)
## Coordinate Reference System:
## User input: unnamed
## wkt:
## PROJCRS["unnamed",
## BASEGEOGCRS["unnamed ellipse",
## DATUM["unknown",
## ELLIPSOID["unnamed",6371007.181,0,
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433,
## ID["EPSG",9122]]]],
## CONVERSION["Sinusoidal",
## METHOD["Sinusoidal"],
## PARAMETER["Longitude of natural origin",0,
## ANGLEUNIT["degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["False easting",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["easting",east,
## ORDER[1],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]],
## AXIS["northing",north,
## ORDER[2],
## LENGTHUNIT["metre",1,
## ID["EPSG",9001]]]]
The CRS definition is an object of class crs
, which contains the textual definition of the CRS, in the “WKT” format. We will see later on, the crs
of a spatial layer can be initialized using an EPSG code (Section 7.9.2) or by importing it from an existing layer. Therefore, in practice, there is no need to type-in the WKT definition manually.
The extent (“bounding boxes”) can be accessed with st_bbox
. Here are the extents of s
and r
:
st_bbox(s)
## xmin ymin xmax ymax
## 679624 3615959 699784 3644759
st_bbox(r)
## xmin ymin xmax ymax
## 3239946 3267745 3394692 3717158
The extent is returned as an object of class bbox
. The object is basically a numeric vector of length 4, including the xmin
, ymin
, xmax
and ymax
values (in that order).
The resolution, as well as other properties of a stars
dimensions, can be accessed using the st_dimensions
function. We will elaborate on this in Section 6.3.1. In the meanwhile, for completeness, here are the resolutions ("delta"
) of the x- and y-axes of s
and r
:
st_dimensions(s)$x$delta
## [1] 2880
st_dimensions(s)$y$delta
## [1] -2880
st_dimensions(r)$x$delta
## [1] 926.6254
st_dimensions(r)$y$delta
## [1] -926.6254
Note that the resolution is separate for the "x"
and "y"
dimensions. The (absolute) resolution is usually equal for both, in which case raster pixels are square. However, the "x"
and "y"
resolutions can also be unequal, in which case raster pixels are non-square rectangles.
5.3.8.4 Accessing raster values
As shown above (Section 5.3.8.2), raster values can be accessed directly, as a matrix
or an array
, by selecting a raster attribute, either by name or by index. For example:
1]]
s[[## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] NA NA NA NA NA NA NA 3 5 7
## [2,] NA NA NA 61 106 47 31 39 32 49
## [3,] NA NA NA 9 132 254 233 253 199 179
## [4,] NA NA NA 4 11 146 340 383 357 307
## [5,] NA 4 6 9 6 6 163 448 414 403
## [6,] 3 6 9 10 6 6 13 152 360 370
## [7,] 3 4 7 16 27 12 64 39 48 55
A histogram can give a first impression of the raster values distribution (Figure 5.17), using the hist
function:
hist(s[[1]])
For example, the above histogram tells us that the overall range of elevation values in the raster s
is roughly 0-450 (meters), but most pixels are in the 0-50 range.
Note that we can pass the matrix
(or array
) of raster values to many other functions that accept a numeric vector, such as mean
or range
.
Calculate the mean, minimum and maximum of the cell values in the raster
s
(excludingNA
).
The matrix
or array
rows and columns are reversed, compared to the visual arrangement of the raster, because in a stars
object the first dimension (matrix rows) refers to x
(raster columns) and the second dimension (matrix columns) refers to y
(raster rows)! Therefore, for example, the following expression gives the 7th column of the matrix
with the raster values, which is actually the 7th row in the raster (Figure 5.14):
1]][, 7]
s[[## [1] NA 31 233 340 163 13 64
We can modify (a subset of) raster values using assignment. For example, the following code creates a copy of the raster s
, named u
, then replaces the values in the 7th row with the value -1
:
= s
u 1]][, 7] = -1 u[[
The result is shown in Figure 5.18:
plot(u, text_values = TRUE, axes = TRUE, col = terrain.colors(8))
We can even replace the entire matrix
or array
of values with a custom one. This can be done using assignment to an “empty” subset, which implies selecting all cells, as in r[[1]][]
. For example, the following code section creates another copy named u
, then replaces all values with a consecutive vector:
= s
u 1]][] = 1:length(u[[1]])
u[[1]]
u[[## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 8 15 22 29 36 43 50 57 64
## [2,] 2 9 16 23 30 37 44 51 58 65
## [3,] 3 10 17 24 31 38 45 52 59 66
## [4,] 4 11 18 25 32 39 46 53 60 67
## [5,] 5 12 19 26 33 40 47 54 61 68
## [6,] 6 13 20 27 34 41 48 55 62 69
## [7,] 7 14 21 28 35 42 49 56 63 70
The result is shown in Figure 5.19:
plot(u, text_values = TRUE, axes = TRUE, col = terrain.colors(10))
Sometimes it is useful to assign the same value to all cells, to create a uniform raster. This can be done using assignment of a single value, which is replicated, to the subset of all raster cells:
= s
u 1]][] = -3
u[[1]]
u[[## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [2,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [3,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [4,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [5,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [6,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
## [7,] -3 -3 -3 -3 -3 -3 -3 -3 -3 -3
The result is shown in Figure 5.20:
plot(u, text_values = TRUE, axes = TRUE, col = terrain.colors(1))
## Warning in plot.stars(u, text_values = TRUE, axes = TRUE, col =
## terrain.colors(1)): breaks="quantile" leads to a single class; maybe try
## breaks="equal" instead?
5.3.9 Writing raster to file
We will not need to export rasters (or vector layers) in this book, since we will be working exclusively in the R environment. In practice, however, one often needs to export spatial objects from R to a file, to share them with other colleagues, further edit or process them in GIS software such as ArcGIS or QGIS, and so on.
Writing a stars
raster object to a file on disk is done using write_stars
. To run write_stars
, we need to specify:
obj
—Thestars
object to writedsn
—The file name (or file path) of the raster file being created
The function can automatically detect the required file format based on the file extension. For example, the following expression exports the stars
object named s
to a GeoTIFF file named dem_copy.tif
in the current working directory:
write_stars(s, "dem_copy.tif")
Since R 4.0.0 (https://cran.r-project.org/doc/manuals/r-devel/NEWS.html).↩︎
Due to the conventions of
image
, amatrix
image is actually reversed in 90 degrees compared to the textual representation of amatrix
. To get the same orientation as in the textual representation, useimage(t(apply(volcano,2,rev)))
↩︎There are similar functions named
rowSums
andcolSums
for calculating row and column sums, respectively.↩︎Note that there are no data structure for zero-dimensional data structures (i.e., scalars) in R.↩︎
Unless a new version of a package was released and we want to update it, in which case we need to re-install the package.↩︎
These classes are both used to represent multi-band rasters. They differ only in their internal structure.↩︎
GeoTIFF files can come with both
*.tif
and*.tiff
file extension, so if one of them does not work you should try the other.↩︎