Rasters (rasterio)
Contents
Rasters (rasterio
)¶
Last updated: 2022-05-24 13:35:03
Introduction¶
In this section we switch to the second type of spatial layers, rasters. Rasters are basically georeferenced images. That is, in additional to the numeric array that contains the image values, that every image has, a raster also has metadata specifying the rectangual extent that the image corresponds to in a particular spatial Coordinate Reference System. Rasters are stored in special formats, such as GeoTIFF (.tif
) or Erdas Imagine Image (.img
).
To work with rasters, we are going to use the rasterio
package. The rasterio
package is compatible with, and extends, the numpy
package. Specifically, we will go over the following raster workflow:
Reading raster files and examining their properties (see Raster file connection, Raster properties, and Reading raster data)
Plotting rasters (see Plotting (rasterio))
Creating a raster from a
numpy
array (see Creating raster from array)Writing raster to file (see Writing rasters)
Making calculations with raster values (see “No Data” masks, and Working with raster values)
However, the rasterio
package it is not as comprehensive as geopandas
is for working with vector layer. Therefore, in the second part of this chapter, we will demonstrate more specific tasks through additional third-party packages:
richdem
—for calculating topographic indices (see DEM calculations)scipy
—for focal filtering (see Focal filtering)
In the next chapter (see Raster-vector interactions), we are going to explore operations that involve both a raster and a vector layer, such as converting a raster to polygons or extracting raster values to points or polygons.
Loading raster packages¶
In this chapter we use quite a few more packages than in any of the previous chapters. Let us go over the packages and their purpose, and load them, before we start working.
To start working with rasterio
, first of all, we import the rasterio
package, as well as the show
function (used for raster plotting) from the rasterio.plot
sub-package:
import rasterio
from rasterio.plot import show
We will also load the numpy
package and shapely.geometry
sub-package, which we use to load a numpy
array and convert it to a raster (see Creating raster from array), and for calculating the raster bounding box (see Raster extent), respectively:
import numpy as np
import shapely.geometry
We load the glob
standard package, which we breifly use to process multiple file paths when reading raster files (see Stacking raster bands):
import glob
Finally, we load the rasterstats
(see Zonal statistics), richdem
(see DEM calculations), and scipy
(see Focal filtering) packages, which we use for specific raster processing tasks later on in this chapter. The rasterstats
package also relies on geopandas
(see Preparing the polygons), which we load as well:
import geopandas as gpd
import rasterstats
import richdem as rd
import scipy.ndimage
/home/michael/.local/lib/python3.8/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
warnings.warn(
What is rasterio
?¶
rasterio
is a third-party Python package for working with rasters. rasterio
makes raster data accessible in the form of numpy
arrays, so that we can operate on them, then write back to new raster files. rasterio
, like most raster processing software, is based on the GDAL program. As mentioned above, working with rasters in Python is less organized around one comprehensive package (such as the case for vector layers and geopandas
). Instead, there are several packages providing alternative (subsets of methods) of working with raster data.
The two most notable approaches for working with rasters in Python are provided by the rasterio
(which we learn about in this chapter) and xarray
packages. These two packages differ in their scope and underlying data models. Specifically, rasterio
represents rasters as numpy
arrays associated with a separate object holding the spatial metadata. The xarray
package, however, represents rasters with the native DataArray
object, which is an extension of numpy
arrays designed to hold axis labels and attributes, in the same object, together with the array of raster values.
Both the rasterio
and xarray
packages are not comprehensive in the same way as geopandas
is. For example, when working with rasterio
, on the one hand, more packages may be needed to accomplish (commonly used) tasks such as zonal statistics (package rasterstats
, see Zonal statistics) or calculating topographic indices (package richdem
, see DEM calculations). On the other hand, xarray
was extended to accommodate spatial operators missing from the core package itself, with the rioxarray
and xarray-spatial
packages.
Raster file connection¶
A raster data source, such as a GeoTIFF file, can be accessed using the rasterio.open
function. This creates a connection to the raster data. The raster properties are imported instantly (see Raster properties). The raster data, however, are not automatically imported, as they are potentially very large and memory-consuming. Raster data can be read from the raster “connection” object created with rasterio.open
in a separate step (see Reading raster data).
Note
Conceptually, a raster file connection object created with rasterio.open
is similar to a text file connection object created with open
(see File object and reading lines).
The BSV_res200-M.tif
raster, provided with the sample data, is a three-band (red, green, blue) aerial photo of Beer-Sheva, taken at 2015, at 2 \(m\) resolution. The following expression creates a connection object named src
to this raster:
src = rasterio.open("data/BSV_res200-M.tif")
src
<open DatasetReader name='data/BSV_res200-M.tif' mode='r'>
Note that the printout includes the mode='r'
part, which indicates the dataset is opened in reading mode. This is the default mode of rasterio.open
, so the following expression does exactly the same:
src = rasterio.open("data/BSV_res200-M.tif", "r")
src
<open DatasetReader name='data/BSV_res200-M.tif' mode='r'>
Later on, we will see how a raster file can also be opened in writing mode ("w"
) when our intention is to write a raster file (see Writing rasters).
Raster properties¶
Overview¶
As mentioned above, the rasterio.open
function creates a “connection” to a raster file, without reading the data, i.e., without actually reading the raster values. However, the essential file properties, or metadata, are being read and consequently contained in the object (such as r
). Let us go over these properties now.
The file name and connection mode can be obtained, from a raster file connection object, through its .name
and .mode
properties, respectively:
src.name
'data/BSV_res200-M.tif'
src.mode
'r'
Most of the other raster object properties can be accessed at once, through the .meta
property, which returns a dict
as follows:
src.meta
{'driver': 'GTiff',
'dtype': 'uint8',
'nodata': None,
'width': 10000,
'height': 10000,
'count': 3,
'crs': None,
'transform': Affine(2.0, 0.0, 170000.0,
0.0, -2.0, 580000.0)}
There are also specific properties for direct access, as shown below.
Raster dimensions¶
The most fundamental property of a raster is its dimensions, namely, the number of layers, rows, and columns. These are stored in the following raster object properties:
.count
—Number of layers.height
—Number of rows.width
—Number of columns
For example, the raster BSV_res200-M.tif
has three layers (corresponding to red, green, and blue channels), and 10,000 rows and columns:
src.count
3
src.height
10000
src.width
10000
Raster CRS¶
The raster Coordinate Reference System (CRS) is stored in the .crs
property of the raster object, in the same format as we have seen for vector layers (see CRS and reprojection). The raster BSV_res200-M.tif
does not contain a CRS definition, thus r.crs
is equal to None
and nothing is printed:
src.crs
Raster extent¶
Other than the CRS, for the raster to be geo-referenced we need to know the coordinates of the pixels. Since a raster is a regular rectangular grid, it is enough to know either one of the following:
The origin and the resolution, i.e., the coordinates of one corner, typically the top-left (
xmin
,ymax
) and the resolution (delta_x
,delta_y
)The bounding box, i.e., the coordinates of the bottom-left and the top-right corners (
xmin
,ymin
,xmax
,ymax
), whereas the resolution can be determined according to the number of rows and columns
In rasterio
, the extent information is given in the form of a transformation matrix, in the .transform
property:
src.transform
Affine(2.0, 0.0, 170000.0,
0.0, -2.0, 580000.0)
The values in the transform matrix provide the origin and resolution information, in the following form:
Affine(`delta_x`, 0.0, `xmin`,
0.0, `delta_y`, `ymax`)
Notably, the x-axis and y-axis resolutions are identical, which is often the case, since raster pixels usually represent squares. The y-axis resolution is negative since, by convention, the origin is defined as the top-left corner (i.e., the y-axis origin is ymax
and not ymin
). Therefore, with each step along the raster rows we are actually going down rather than up, and y-axis values get progressively smaller.
The matrix can be multiplied by a pixel position, in the form (row,column)
, to obtain the coordinates of that pixel. For example, here are the coordinates of the top-left corner, i.e., row 0
and column 0
:
src.transform * (0, 0)
(170000.0, 580000.0)
Exercise 10-a
Modify the above expression to find the coordinates of the bottom-right corner of the raster.
As we have just seen, the coordinates of the raster corners, i.e., its extent or bounds, can be derived from the .transform
property and raster dimensions. However, the extent is also provided directly through the .bounds
property:
src.bounds
BoundingBox(left=170000.0, bottom=560000.0, right=190000.0, top=580000.0)
The get one of the specific values we can further specify one of the internal properties, left
, bottom
, right
, or top
(corresponding to xmin
, ymin
, xmax
, and ymax
, respectively). For example:
src.bounds.left
170000.0
We can also convert the .bounds
object to a shapely
geometry using the shapely.geometry.box
function (see Bounds (shapely)), as follows. This is very useful in case we need to evaluate the raster extent with respect to other layers, for example, to figure out whether the aerial photograph captured the entire area of a particular town.
shapely.geometry.box(*src.bounds)
Note that the expression uses the positional arguments technique (see Variable-length positional arguments).
Exercise 10-b
How can we create a
GeoDataFrame
with one polygon feature, representing the raster bounds?
Plotting (rasterio
)¶
Plotting a raster¶
To get a sense of the information in the raster, a basic plot can be produced using the show
function. When the raster file connection contains three bands, these are automatically teanslated to an RGB image:
show(src);

We can also pass a tuple with a raster file connection and an index of a particular band, in which case we get an image of that band only:
show((src, 1));

When displaying a single band, we can specify a color scale with the cmap
argument, similarly to .plot
in geopandas
(see Setting symbology). For example:
show((src, 1), cmap="Spectral");

Note
See the Plotting section in the rasterio
documentation for more details and examples: https://rasterio.readthedocs.io/en/latest/topics/plotting.html.
Plotting a raster and a vector layer¶
To plot a raster and a vector layers together, we need to set a plot object, and pass the same ax
argument to both plots. For example, let us import the railway lines layer and filter the active lines (see Reading vector layers and Filtering by attributes):
rail = gpd.read_file("data/RAIL_STRATEGIC.shp")
rail = rail[rail["ISACTIVE"] == "פעיל"]
Here is how we can plot the railway lines and the aerial photo together:
fig, ax = plt.subplots()
base = show(src, ax=ax)
rail.plot(ax=ax, edgecolor="red");

Keep in mind that, as usual, that both layers need to be in the same CRS for an operation involving both to make sense. In this case the BSV_res200-M.tif
is in the ITM (EPSG 6991
) CRS, same as RAIL_STRATEGIC.shp
, even though the CRS definition is missing from the .tif
file.
Reading raster data¶
Overview¶
As shown above, opening a raster file with rasterio.open
creates a file connection object, containing raster metadata (see Raster file connection). However, the raster data, that is, the array of raster values, is not read into memory. To read the raster values we need to use the .read
method of the file connection object. The separation of reading raster values may feel cumbersome. However, this type of design has advantages in terms of memory-use efficiency. For example, in case the raster is very large we can read only part of the information, or process it in consecutive “chunks”, so that we do not consume too much RAM which will make the computer “freeze”.
Reading all bands¶
To read all raster bands into an in-memory numpy
array, we use the .read
method without specifying any parameters. For example, the following expression reads the raster values from all bands, into an ndarray
object named r
:
r = src.read()
r
array([[[115, 132, 75, ..., 110, 103, 111],
[133, 141, 135, ..., 105, 96, 94],
[111, 147, 140, ..., 94, 97, 103],
...,
[140, 68, 94, ..., 122, 128, 129],
[148, 108, 124, ..., 131, 124, 134],
[150, 119, 136, ..., 110, 137, 130]],
[[111, 129, 75, ..., 101, 98, 106],
[132, 140, 132, ..., 99, 92, 88],
[111, 143, 136, ..., 87, 92, 96],
...,
[133, 75, 89, ..., 120, 127, 127],
[139, 105, 111, ..., 129, 126, 133],
[140, 108, 128, ..., 104, 139, 127]],
[[113, 122, 76, ..., 95, 90, 96],
[125, 132, 128, ..., 90, 84, 83],
[113, 137, 132, ..., 80, 84, 88],
...,
[123, 74, 90, ..., 119, 121, 124],
[128, 102, 104, ..., 124, 120, 127],
[131, 106, 120, ..., 104, 134, 121]]], dtype=uint8)
When reading all raster bands, we get a three-dimensional array. The order of the dimensions is (bands,rows,columns)
:
r.shape
(3, 10000, 10000)
Once we have an array with all bands, such as r
, we can always subset a particular band (or any other type of subset), using the numpy
subsetting methods (see Subsetting arrays). For example, the following expression “pulls out” a two-dimensional array with the values of the 3rd raster band:
r[2]
array([[113, 122, 76, ..., 95, 90, 96],
[125, 132, 128, ..., 90, 84, 83],
[113, 137, 132, ..., 80, 84, 88],
...,
[123, 74, 90, ..., 119, 121, 124],
[128, 102, 104, ..., 124, 120, 127],
[131, 106, 120, ..., 104, 134, 121]], dtype=uint8)
Reading specific bands¶
In case we need specific bands, to conserve memory, it is better to read just the band we need in the first place. This can be done by specifying the band index inside .read
. Note that the index starts from 1
. For example, here we read each of the bands 1
, 2
, and 3
, into separate arrays named red
, green
, and blue
, respectively:
red = src.read(1)
green = src.read(2)
blue = src.read(3)
Each of red
, green
, and blue
is a two-dimensional array. For example:
red.shape
(10000, 10000)
Again, note that .read()
imports the raster data into a three-dimensional array, while .read(1)
, .read(2)
, etc. import the raster data into a two-dimensional array. Therefore, .read()
and .read(1)
do not produce the same result even if the raster data source is one-dimensional.
It is also possible to read several bands at once, by specifying a list
of bands. For example, here we read the third and second bands (in that order):
src.read([3, 2])
array([[[113, 122, 76, ..., 95, 90, 96],
[125, 132, 128, ..., 90, 84, 83],
[113, 137, 132, ..., 80, 84, 88],
...,
[123, 74, 90, ..., 119, 121, 124],
[128, 102, 104, ..., 124, 120, 127],
[131, 106, 120, ..., 104, 134, 121]],
[[111, 129, 75, ..., 101, 98, 106],
[132, 140, 132, ..., 99, 92, 88],
[111, 143, 136, ..., 87, 92, 96],
...,
[133, 75, 89, ..., 120, 127, 127],
[139, 105, 111, ..., 129, 126, 133],
[140, 108, 128, ..., 104, 139, 127]]], dtype=uint8)
Creating raster from array¶
In some cases, we may need to go the other way around: transforming an array into a raster. For example, we may be given an array (e.g., in a CSV file) and the extent and CRS information, and asked to store the information in a spatial raster file format such as GeoTIFF. As we have seen above, a raster is just an array associated with spatial metadata, the origin and resolution (.transform
) and the CRS (.crs
). Thus going from an array to a raster technically means we need to construct the .transform
and .crs
objects.
To demonstrate, let us import the carmel.csv
matrix, which contains a Digital Elevation Model (DEM) of the area around Haifa (see Reading array from file):
m = np.genfromtxt("data/carmel.csv", delimiter = ",")
m
array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]])
Using the show
function, here is a visualization of the resulting m
array:
show(m);

The m
array is defined as float64
:
m.dtype
dtype('float64')
However, it actually contains only integers. To be more efficient, we will therefore convert (see Changing data type) it to an int16
array. Since int
arrays cannot contain “No Data” values (see “No Data” representation), we are going to encode “No Data” as -9999
, which is a value never encountered in DEMs and a common convention:
m[np.isnan(m)] = -9999
m = m.astype(np.int16)
m
array([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
...,
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
Now, what is missing is the raster georeferencing information: the .transform
matrix and the CRS definition. This information needs to be known in advance. If we don’t have it, we can try to manually georeference the raster using control points (such as landmarks, trees, large stones, etc.), however this may be difficult and error-prone, unless the raster is a high-resolution aerial photograph. In this example, let us assume that the origin and resolution of the raster are known in advance. In such case, we can use the rasterio.transform.from_origin
function to create a .transform
object:
new_transform = rasterio.transform.from_origin(662317, 3658412, 90, 90)
new_transform
Affine(90.0, 0.0, 662317.0,
0.0, -90.0, 3658412.0)
Also, let us assume that we know the CRS of the raster, and that it is UTM Zone 36N (32636
). Now we have all necessary information to export the Haifa DEM into a spatial raster format such as GeoTIFF, which is what we do next.
Writing rasters¶
Writing single-band raster¶
To write a raster file, we need to open
a (non-existing) raster file in write ("w"
) mode. The best-supported and recommended format is GeoTIFF. As opposed to read mode, the open
function needs quite a lot of information (in addition to the file name and mode):
driver
—The file formatheight
—Number of rowswidth
—Number of columnscount
—Number of bandsnodata
—The value which represents “No Data”dtype
—The raster data typecrs
—The CRStransform
—The origin/resolution matrix
We already have all the necessary information. Here is the expression to open a GeoTIFF file named carmel.tif
in writing mode, with the right arguments to store the Haifa DEM values in array m
:
new_dataset = rasterio.open(
"output/carmel.tif", "w",
driver = "GTiff",
height = m.shape[0],
width = m.shape[1],
count = 1,
nodata = -9999,
dtype = m.dtype,
crs = 32636,
transform = new_transform
)
Exercise 10-c
Go over the arguments in the above
rasterio.open
function call, and make sure you understand where they come from and what is their meaning.
Once the file connection is ready, what is left to be done is just to use the .write
method to actually write the data into the file. When writing, we need to specify the array with the data to write (such as m
), and the band index where the data will be written. Note that the band index starts from 1
:
new_dataset.write(m, 1)
In the end, we must .close
the file to make sure that writing is completed:
new_dataset.close()
Stacking raster bands¶
Remote sensing products, such as satellite images, are often distributed as a collection of single-band files, one file for each spectral band of the sensor. For example, Landsat and Sentinel satellite images are distributed in this way. In this section, we are going to learn how to “stack” separate single-band raster files into a multi-band raster which is easier to work with, and to view in GIS software.
First, we will create a list
with the file paths of the separate raster files which we are going to stack. The sample image is a Sentinel-2 satellite image of central/southern Israel from 2020-12-26. Sentinel-2 satellite images are distributed in a raster format called JPEG2000 (.jp2
). The image original contains 14 bands. To keep things more simple, the sample data contains just four out of 14 files, the ones which are usually important: the red, green, blue, and Near Infrared bands. Here are their file paths:
path = "data/T36RXV_20201226T082249_"
files = [
path + "B02.jp2", # Blue
path + "B03.jp2", # Green
path + "B04.jp2", # Red
path + "B08.jp2" # NIR
]
files
['data/T36RXV_20201226T082249_B02.jp2',
'data/T36RXV_20201226T082249_B03.jp2',
'data/T36RXV_20201226T082249_B04.jp2',
'data/T36RXV_20201226T082249_B08.jp2']
The aboves code section requires that we manually type the file name components, and that we know in advance the number of files. A more automated approach would be to search the files according to a patrticular pattern inside a given directory. For example, relying on the fact that the Sentinel-2 images files we are interested in are located in the data
directory and characterized by the .jp2
extension, we can get the same list
with their paths using shorter code, and the taking advantage of the glob
standard package, as follows:
files = glob.glob("data/*.jp2")
files
['data/T36RXV_20201226T082249_B02.jp2',
'data/T36RXV_20201226T082249_B08.jp2',
'data/T36RXV_20201226T082249_B03.jp2',
'data/T36RXV_20201226T082249_B04.jp2']
However, in this case we also need to manually sort (see list methods) the files, so that they match the intended order of bands in the multi-band raster we will be writing:
files.sort()
files
['data/T36RXV_20201226T082249_B02.jp2',
'data/T36RXV_20201226T082249_B03.jp2',
'data/T36RXV_20201226T082249_B04.jp2',
'data/T36RXV_20201226T082249_B08.jp2']
Since the files comprise “bands” of the same image, the spatial metadata of all images is identical. Therefore, to write a multi-band raster we are going to use the spatial metadata from one of the images. It does not matter which one, so we are going to use the first image. First, we create the file connection:
src = rasterio.open(files[0])
src
<open DatasetReader name='data/T36RXV_20201226T082249_B02.jp2' mode='r'>
Then, we extract the spatial metadata (see Raster properties), and keep it in a separate object which we will use later on:
meta = src.meta
meta
{'driver': 'JP2OpenJPEG',
'dtype': 'uint16',
'nodata': None,
'width': 3554,
'height': 2968,
'count': 1,
'crs': CRS.from_epsg(32636),
'transform': Affine(10.000856049521643, 0.0, 652642.4115,
0.0, -10.000219878706249, 3473256.1222)}
Before we can use this meta
object when writing the multi-band raster, we need to modify two of its properties:
count
—The number of bands, needs to be4
instead of1
driver
—Needs to be GeoTIFF ("GTiff"
) instead of JPEG2000, since this is the file format is best supported for writing withrasterio
(see Writing single-band raster)
rasterio
has a specialized method named .update
, to update metadata objects, as follows:
meta.update(count=len(files))
meta.update(driver="GTiff")
Let us check the meta
object to make sure the properties were actually updated:
meta
{'driver': 'GTiff',
'dtype': 'uint16',
'nodata': None,
'width': 3554,
'height': 2968,
'count': 4,
'crs': CRS.from_epsg(32636),
'transform': Affine(10.000856049521643, 0.0, 652642.4115,
0.0, -10.000219878706249, 3473256.1222)}
Now that we have the list of files (files
), and the updated spatial metadata from one of the files (meta
), we proceed to actually writing the multi-band raster. Here is the code to do that:
with rasterio.open("output/sentinel2.tif", "w", **meta) as dst:
for id, layer in enumerate(files, start=1):
with rasterio.open(layer) as src:
dst.write(src.read(1), id)
This code section is relatively complex, so we will now go over the main points to pay attention to. First, note that we have two rasterio.open
expressions:
On the “top” level, we open a file named
"output/sentinel2.tif"
(which may not exist yet) in writing mode asdst
. This is where we write the multi-band raster. Note that we pass themeta
object as keyword arguments (see Variable-length keyword arguments) with**meta
.Inside the
for
loop which goes over thefiles
, we open each file in reading mode. Using theenumerate
technique (see Using enumerate), in each iteration we have access to the current bandid
starting from1
(i.e.,1
,2
,3
, and4
) and the current file name. Inseide the iteration we then read the first (and only) band of each file (src.read
), and immediately write it to the respective bandid
in the destination file (dst.write(..., id)
).
Also note that contexts (see Using contexts (with)) are being used to automate closing of file connections. This saves the trouble of writeing src.close()
and dst.close()
expressions.
The above code created a new file named sentinel2.tif
, which we can now read back into the Python session to work with it:
src = rasterio.open("output/sentinel2.tif")
src
<open DatasetReader name='output/sentinel2.tif' mode='r'>
Examining the metadata tells us that this is indeed a four-band raster:
src.meta
{'driver': 'GTiff',
'dtype': 'uint16',
'nodata': None,
'width': 3554,
'height': 2968,
'count': 4,
'crs': CRS.from_epsg(32636),
'transform': Affine(10.000856049521643, 0.0, 652642.4115,
0.0, -10.000219878706249, 3473256.1222)}
r = src.read()
r
array([[[1600, 1560, 1598, ..., 1272, 1315, 1316],
[1614, 1570, 1608, ..., 1196, 1262, 1316],
[1630, 1584, 1619, ..., 1131, 1191, 1295],
...,
[1597, 1576, 1599, ..., 2149, 2201, 2169],
[1602, 1597, 1569, ..., 2220, 2237, 2228],
[1577, 1577, 1563, ..., 2191, 2296, 2195]],
[[1735, 1681, 1719, ..., 1361, 1396, 1358],
[1747, 1711, 1723, ..., 1245, 1356, 1398],
[1766, 1754, 1739, ..., 1133, 1240, 1344],
...,
[1738, 1722, 1747, ..., 2501, 2602, 2647],
[1736, 1736, 1705, ..., 2655, 2684, 2671],
[1688, 1690, 1680, ..., 2652, 2719, 2693]],
[[2107, 2101, 2107, ..., 1514, 1610, 1568],
[2120, 2108, 2104, ..., 1294, 1510, 1549],
[2140, 2124, 2109, ..., 983, 1248, 1463],
...,
[2224, 2229, 2250, ..., 3265, 3337, 3440],
[2248, 2195, 2229, ..., 3486, 3468, 3556],
[2207, 2163, 2229, ..., 3452, 3574, 3529]],
[[2793, 2710, 2797, ..., 2810, 2750, 2727],
[2871, 2759, 2770, ..., 2930, 2908, 2821],
[2859, 2793, 2807, ..., 2995, 2969, 2888],
...,
[2682, 2720, 2733, ..., 3649, 3774, 3802],
[2707, 2671, 2668, ..., 3851, 3905, 3919],
[2639, 2624, 2665, ..., 3816, 4025, 3892]]], dtype=uint16)
When plotting an RGB image of the raster we need to make sure the bands are arranged in the Red-Green-Blue order, as follows. Nevertheless, something goes wrong and we see a balnk image. The problem is that (as stated in the printed message) integer values are interpreted as occupying the 0
-255
range, while in this case the values are reflectance (0
-1
) multiplied by 10000
, thus occupything the 0
-10000
range. We are going to fix this shortly (see Rescaling).
show(r[[2,1,0], :, :]);
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

“No Data” masks¶
Rasters may contain “No Data”, i.e., pixels that have missing values. For example, remote sensing data, such as satellite images, may contain “No Data” pixel values to mark unreliable or unusable values, due to atmospheric interference such as clouds. In other cases, the “No Data” values may be used to mask the pixels beyond the area of interest, such as elevation over non-land areas, such as over water bodies (see Writing rasters).
A raster file typically stores a flag that specified which specific value is used to represent “No Data”, accessible through the src.nodata
property. As discussed earlier, due to memory-related limitations (see “No Data” representation), missing values cannot be represented by np.nan
in int
arrays. Therefore, when reading int
rasters, missing values are interpreted as ordinary numeric values (such as -9999
). It is up to the user to decide what to do in such case:
Keep the raster as is, while being mindful of the numeric value representing “No Data”
Transforming the raster to
float
, where those values can be replaced withnp.nan
Even though float
rasters can hold np.nan
values, for uniformity no automatic replacement of “No Data” values to np.nan
is done by rasterio
in float
rasters either.
Let us go back to the carmel.tif
raster with the DEM of the Carmel area created earlier (see Writing rasters). Recall that the numpy
array with raster values which we wrote to file contained missing values (in the form of np.nan
), representing undetermined elevation for the Mediterranean sea. When creating the carmel.tif
raster file, we chose to represent the values using an integer data type, which, due to computer memory limitations, cannot represent np.nan
. Therefore np.nan
values were replaced with -9999
and the nodata=-9999
tag was specified in the carmel.tif
file metadata (see Writing rasters).
Indeed, if we read back the file carmel.tif
, we can see the nodata
property set to -9999
in the raster metadata:
src = rasterio.open("output/carmel.tif")
src.meta
{'driver': 'GTiff',
'dtype': 'int16',
'nodata': -9999.0,
'width': 533,
'height': 624,
'count': 1,
'crs': CRS.from_epsg(32636),
'transform': Affine(90.0, 0.0, 662317.0,
0.0, -90.0, 3658412.0)}
However, when reading the data the ordinary way with .read
, the values are read as-is, whereas the indication of “No Data” is unavailable in the resulting array:
src.read(1)
array([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
...,
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
There are two ways to deal with the situation masking the “No Data” values. First, using the additional masked=True
argument, we can read the raster data into a special type of a numpy
array known as a masked array:
r = src.read(1, masked=True)
r
masked_array(
data=[[--, --, --, ..., --, --, --],
[--, --, --, ..., --, --, --],
[--, --, --, ..., --, --, --],
...,
[--, --, --, ..., --, --, --],
[--, --, --, ..., --, --, --],
[--, --, --, ..., --, --, --]],
mask=[[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True]],
fill_value=-9999,
dtype=int16)
Detailed description of the masked array data structure is beyond the scope of this book. In short, a masked array is basically composed of two components:
.data
—The data (e.g.,dem1.data
).mask
—A boolean “NoData” mask (e.g.,dem1.mask
)
For example:
r.data
array([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
...,
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
r.mask
array([[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., True, True, True]])
Second, alternatively, we can read the data as an ordinary array, and read the “No Data” mask into a separate array using the .read_masks
method:
r = src.read(1)
mask = src.read_masks(1)
In the resulting “NoData” array, missing values are indicated by 0
, while valid values are indicated by 255
:
r
array([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
...,
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
mask
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
Working with raster values¶
Overview¶
Now that we know how to gain access to raster data, in the form of numpy
arrays, we can perform many types of common raster operations which require just the raster values. For exampe, we can:
Summarizing values—Summarize raster values, globally or separately per band
Rescaling—Linearly translate the raster values to a different data range
Calculating indices—Calculating a single-band raster using a formula, applied on several raster bands, such as calculating an NDVI image from red and near-infrared bands
Reclassifying—Convert a continuous raster to a categorical one
We will now see an example of each.
Summarizing values¶
For the next examples, let us go back to the Sentinel-2 image:
src = rasterio.open("output/sentinel2.tif")
r = src.read()
To summarize raster values, either globally, or for specific dimensions, we can use numpy
functions, possibly specifying the axis
argument (see Global summaries). For example, the “global” minimum and maximum values of the raster—across the all bands and pixels—can be obtained with:
np.min(r)
0
np.max(r)
27999
To get the minimum and maximum of all pixels separately per band, recall that rasterio
places the bands into the 0
dimension:
r.shape
(4, 2968, 3554)
Therefore, we need to specify axis=(1,2)
, i.e., the axes that we want to summarize and “discard” are 1
and 2
(see Summaries per dimension (3D)). For example, here are the minimum values in each of the four bands of the Sentinel-2 image:
np.min(r, axis=(1, 2))
array([ 94, 0, 0, 284], dtype=uint16)
and here are the maximum values:
np.max(r, axis=(1, 2))
array([27994, 26490, 27992, 27999], dtype=uint16)
Exercise 10-d
Write an expression that returns the average value per pixel.
The result should be a two-dimensional array of size
2968
*3554
(check your answer using the.shape
property).
Rescaling¶
Rescaling and calculating indices (see Calculating indices) are examples of operations that are applied on each pixel, separately, to compose a new raster with the results. This type of operations are collectively known as raster algebra.
For example, the Sentinel-2 images are stored as uint16
, for conserving storage space, but they need to be reslaced by dividing each pixel value by 10000
to get the true reflectance values. This is a common “compression” technique when storing satellite images.
To get the reflectance values, we need to rescale all values dividing the entire array by 10000
:
r = r / 10000
r
array([[[0.16 , 0.156 , 0.1598, ..., 0.1272, 0.1315, 0.1316],
[0.1614, 0.157 , 0.1608, ..., 0.1196, 0.1262, 0.1316],
[0.163 , 0.1584, 0.1619, ..., 0.1131, 0.1191, 0.1295],
...,
[0.1597, 0.1576, 0.1599, ..., 0.2149, 0.2201, 0.2169],
[0.1602, 0.1597, 0.1569, ..., 0.222 , 0.2237, 0.2228],
[0.1577, 0.1577, 0.1563, ..., 0.2191, 0.2296, 0.2195]],
[[0.1735, 0.1681, 0.1719, ..., 0.1361, 0.1396, 0.1358],
[0.1747, 0.1711, 0.1723, ..., 0.1245, 0.1356, 0.1398],
[0.1766, 0.1754, 0.1739, ..., 0.1133, 0.124 , 0.1344],
...,
[0.1738, 0.1722, 0.1747, ..., 0.2501, 0.2602, 0.2647],
[0.1736, 0.1736, 0.1705, ..., 0.2655, 0.2684, 0.2671],
[0.1688, 0.169 , 0.168 , ..., 0.2652, 0.2719, 0.2693]],
[[0.2107, 0.2101, 0.2107, ..., 0.1514, 0.161 , 0.1568],
[0.212 , 0.2108, 0.2104, ..., 0.1294, 0.151 , 0.1549],
[0.214 , 0.2124, 0.2109, ..., 0.0983, 0.1248, 0.1463],
...,
[0.2224, 0.2229, 0.225 , ..., 0.3265, 0.3337, 0.344 ],
[0.2248, 0.2195, 0.2229, ..., 0.3486, 0.3468, 0.3556],
[0.2207, 0.2163, 0.2229, ..., 0.3452, 0.3574, 0.3529]],
[[0.2793, 0.271 , 0.2797, ..., 0.281 , 0.275 , 0.2727],
[0.2871, 0.2759, 0.277 , ..., 0.293 , 0.2908, 0.2821],
[0.2859, 0.2793, 0.2807, ..., 0.2995, 0.2969, 0.2888],
...,
[0.2682, 0.272 , 0.2733, ..., 0.3649, 0.3774, 0.3802],
[0.2707, 0.2671, 0.2668, ..., 0.3851, 0.3905, 0.3919],
[0.2639, 0.2624, 0.2665, ..., 0.3816, 0.4025, 0.3892]]])
Now that we have a float64
raster with values between 0
and 1
, we can observe an RGB image of the raster as follows:
show(r[[2,1,0], :, :]);
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Reclassifying¶
When examining the range of values, we find out that we have values above 1
, even though valid values of reflectance are between 0
and 1
:
[r.min(), r.max()]
[0.0, 2.7999]
Exercise 10-e
How many values
>1
do we have ina
? (answer:228
)How many pixels with at least one value
>1
do we have? (answer:120
)
Higher values can arise, for example, as a result from image processing and atmospheric correction algorithms. Usually, we want to replace such values with 1
. This is an example of reclassifying, grouping all or same ranges of data values in the raster into uniform categories where all pixels share the same value. In this case, we want to replace all pixels with >1
into 1
. Technically, this is a numpy
“assignment to subset” operation, which we are already familiar with (see Masking):
r[r > 1] = 1
Let us check the data range of the modified raster. Indeed, the data range is now 0
-1
:
[r.min(), r.max()]
[0.0, 1.0]
Calculating indices¶
Calculating indices is a raster algebra operation where we use a particular formula to calculate a single-band image from a multi-band image. The formula is applied on the values of each pixel, separately, and the individual calculations then comprise the resulting single-band image. There is a variety of useful indices developed in image analysis and remote sensing practice. We will see two examples:
Calculating a greyscale image from red, green, and blue bands
Calculating an NDVI image from red and near-infrared bands
Let us start with the grayscale index. Given \(R\), \(G\), and \(B\), bands, the simplest formula for a greyscale image is calculating the average reflectance from all bands:
\(Grey=\frac{R+G+B} {3}\)
In the resulting greyscale image, pixels that have high averaged reflectance will have high greyscale values and will appear bright. Pixels that have low average reflectance will appear dark.
Here is how we can claculate a greyscale image from our Sentinel-2 data:
greyscale = (r[2] + r[1] + r[0]) / 3
greyscale
array([[0.1814 , 0.17806667, 0.1808 , ..., 0.13823333, 0.14403333,
0.1414 ],
[0.1827 , 0.17963333, 0.18116667, ..., 0.1245 , 0.1376 ,
0.1421 ],
[0.18453333, 0.18206667, 0.18223333, ..., 0.10823333, 0.12263333,
0.13673333],
...,
[0.1853 , 0.18423333, 0.18653333, ..., 0.26383333, 0.27133333,
0.2752 ],
[0.1862 , 0.18426667, 0.18343333, ..., 0.2787 , 0.27963333,
0.28183333],
[0.1824 , 0.181 , 0.1824 , ..., 0.2765 , 0.2863 ,
0.28056667]])
It is appropriate to display the result with the "Greys"
color palette:
show(greyscale, cmap="Greys");

Let us move on to the second example, calculating an NDVI image. NDVI (Normalized Difference Vegetation Index) is a common remote sensing index based to Red and Near-Infrared (NIR), emphasizing green vegetation. The NDVI is defined as follows:
\(NDVI=\frac{NIR+Red} {NIR-Red}\)
Here is how we apply this formula to our Sentinel-2 image:
ndvi = (r[3]-r[2]) / (r[3] + r[2])
ndvi
array([[0.14 , 0.12658491, 0.14070147, ..., 0.29972248, 0.26146789,
0.26984866],
[0.15047085, 0.13375796, 0.13664341, ..., 0.38731061, 0.31643278,
0.29107551],
[0.14382877, 0.13605857, 0.14198535, ..., 0.5057818 , 0.40811003,
0.32751092],
...,
[0.09335508, 0.09921196, 0.09692956, ..., 0.05553949, 0.06145409,
0.04998619],
[0.0926337 , 0.09782162, 0.08964672, ..., 0.04974785, 0.05927031,
0.04856187],
[0.08914569, 0.09630249, 0.08908868, ..., 0.05008255, 0.05934991,
0.04891524]])
Since both Red and NIR range between 0
and 1
, NDVI ranges between -1
and 1
:
[ndvi.min(), ndvi.max()]
[-0.41369287905699986, 1.0]
The following expression displays the NDVI image which we created, using the "Greens"
color palette. As expected, vegetated areas (such as agricultiral fields in the top-left corner of the image) are emphasized with high NDVI values.
show(ndvi, cmap="Greens");

DEM calculations¶
Overview¶
There is a large number of specialized GIS functions and operators applicable to DEMs. These include, for example:
Topographic indices, such as slope and aspect
Flow accumulation, contributing area, and basin/channel delineation
Visibility, line of sight, and visible area
This type of operations is beyond the scope of rasterio
, but there are other, more spatialized third-party Python packages for DEM analysis. In this section, we are going to demonstrate calculation of topographic indices, namely topographic aspect, using the richdem
package.
Note
Check out the documentation of richdem
to get an idea of the other DEM-related operations available in this package: https://richdem.readthedocs.io/en/latest/intro.html.
richdem
comes with its own functions to import and export data, named rd.LoadGDAL
and rd.SaveGDAL
. Accordingly, richdem
defines its own class for storing rasters, called rdarray
. We are not going to get into details about this data structure, because we already know how to the analogous operations (such as pre-processing, or examining the properties of rasters) using rasterio
. Our strategy for working with richdem
will be as follows:
Import a DEM from a
.tif
file (which may have been prepared usingrasterio
) usingrd.LoadGDAL
Apply
richdem
functions as necessaryExport the result using
rd.SaveGDAL
(perhaps for further processing withrasterio
)
Topographic aspect¶
Our example to demontrate the workflow of richdem
is calculating a topographic aspect raster. In the derived topographic aspect raster, the value of each pixel reflects the direction where the surface of that pixel (and its \(3\times3\) surrounding neighborhood) is facing, usually in decimal degrees (0
-360
). The ArcPro documentation has a good article named How Aspect works with more details about the topographic aspect calculation.
First, we load the carmel.tif
raster (see Writing single-band raster) using function rd.LoadGDAL
from the richdem
package:
dem = rd.LoadGDAL("output/carmel.tif")
Here is what the resulting rdarray
object looks like when printed. As you can see, it is very similar to a numpy
array:
dem
rdarray([[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
...,
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999],
[-9999, -9999, -9999, ..., -9999, -9999, -9999]], dtype=int16)
Importantly, the rdarray
object keeps track of “No Data” value, which in our case is -9999
:
dem.no_data
-9999.0
Now, we need to apply the richdem
function for calculating topographic aspect given a DEM. This is done using rd.TerrainAttribute
with the argument attrib="aspect"
:
aspect = rd.TerrainAttribute(dem, attrib="aspect")
For examing the output (or further processing it), we will use the more familiar rasterio
methods. For that, we need to export the resulting aspect
object:
rd.SaveGDAL("output/carmel_aspect.tif", aspect)
Then, we can “re-import” it with rasterio.open
:
aspect = rasterio.open("output/carmel_aspect.tif")
We can now, for example, plot the aspect raster to see what it looks like:
show(aspect, cmap='Spectral');

Focal filtering¶
Our final example with rasters with rasters demonstrates yet another third-pacrty package called scipy
. scipy
is a broad-scope and established packages for scientific computing. One of the sub-domains which the package deals with is image processing. Typical image processing operations include filtering (which we demonstrate below), segmentation, and edge detection. Though image processing is not specific to spatial rasters, most operations are applicable and relevant when dealing with spatial data.
To work with scipy
, as always, we first need to import it. The scipy
is split to sub-modules addressing different domains. Each sub-module we want to work with needs to be imported separately, as if it is a standalone package. The scipy.ndimage
sub-package (which we loaded earlier, see Loading raster packages), for example, includes the image processing-related functions, such as functions for different types of filters.
The most straightforward filtering function is ndimage.uniform_filter
, where each pixel in the “filtered” image is the average of pixel values inside a moving window of a specified size. For example, window size of \(3\) implies that each pixel in the filtered image is the average of \(3\times3=9\) neighborhood centered on the “focal” pixel being calculated.
Here is how we can apply a uniform focal filter with a window size of 3 on the ndvi
image we calculated earlier. Note that ndvi
is the ndarray
with the NDVI values:
ndvi3 = scipy.ndimage.uniform_filter(ndvi, 3)
The result ndvi3
is also an ndarray
, of the same shape as ndvi
. Only the values have changed. To understand how the calculation was carried out, we can compare the average of any particular \(3\times3\) neighborhood in the original raster:
ndvi[0:3, 0:3].mean()
0.13889236590152285
with the “focal” pixel value of that neighborhood in the filtered raster:
ndvi3[1, 1]
0.13889236590152285
The values are identical, as expected.
Displaying the filtered raster should appear more blurred than the original image. This is difficult to see in our last example, because the raster is relatively large and the focal window is relatively small:
show(ndvi3, cmap="Greens");

Exercise 10-f
Try repeating the above with a larger filter size, sich as
31
(instead of3
).The result should appear more blurred: use
show
to see for yourself!
Note
There is a lot more to image processing using numpy
and scipy
than shown here. For example, check out the image processing tutorial.
More exercises¶
Exercise 10-d
Read the topographic aspect raster
carmel_aspect.tif
(see Topographic aspect), convert it tofloat
data type and replace “No Data” pixel values withnp.nan
.Reclassify (see Reclassifying) raster into four categories:
0
—North, \(aspect>315\) or \(aspect\leq45\)1
—East, \(45<aspect\leq135\)2
—South, \(135<aspect\leq225\)3
—West, \(225<aspect\leq315\)
Plot the classified image (Fig. 59)
Display the frequency (i.e., number of pixels) of each class in a barplot (Fig. 60). Use the
.plot.bar
method frompandas
to draw the barplot.

Fig. 59 Solution of exercise-10-d1: Reclassified aspect raster¶

Fig. 60 Solution of exercise-10-d2: Frequency of topographic aspect classes¶
Exercise 10-e
Using the Sentinel-2 multispectral image (
output/sentinel2.tif
), calculate an Enhanced Vegetation Index 2 image, using the formula \(EVI2 = 2.5 * ((nir - red) / (nir + 2.4 * red + 1))\)Plot the resulting image.
Create a scatterplot of NDVI vs. EVI values.