Introduction to Spatial Data Programming with R
Last updated: 2020-11-13 14:02:52
This book contains the materials of the 3-credit undergraduate course named Introduction to Spatial Data Programming with R, given at the Department of Geography and Environmental Development, Ben-Gurion University of the Negev. The course was given in 2013, and then each year in the period 2015-2020. An earlier version of the materials was published by Packt (Dorman 2014)1.
The structure of the book is as follows. This section (the Preface) introduces the R programming language, and shows some examples of its capabilities with respect to working with spatial data. In the main part of the book, the material is split in two parts:
- Introduction to R programming (Chapters 1–4) gives all of the necessary knowledge on the R language required before we can start working with spatial data
- Working with spatial data in R (Chapters 5–12) go over the main methods of working with spatial data in R, including how to process rasters, vector layers, and both, as well as two selected more advanced topics: spatio-temporal data and spatial interpolation
Finally, the appendices contain additional information:
- Sample data used in the book (Appendix A)
- Administrative details about the course (Appendix B)
- Exercises (Appendices C–H)
- Examples of exam questions (Appendix I)
Hopefully, the text is detailed enough so that it can be used not only as course materials, but also for independent self-study.
0.2 What is R?
Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of over 100,000 functions from various areas of interest.
0.3 R and analysis of spatial data
Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are at least 185 packages7 specifically addressing spatial analysis in R.
Some important events in the history of spatial analysis support in R are summarized in Table 0.1.
|pre-2003||Variable and incomplete approaches (
|2003||Consensus that a package defining standard data structures should be useful;
|2008||Applied Spatial Data Analysis with R, 1st ed.|
|2013||Applied Spatial Data Analysis with R, 2nd ed.|
|2019||Geocomputation with R (https://geocompr.robinlovelace.net/)|
|2021(?)||Spatial Data Science (https://www.r-spatial.org/book/)|
The question that arises here is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with graphical user interfaces such as ArcGIS or QGIS.
General advantages of Command Line Interface (CLI) software include:
- Automation—Doing otherwise unfeasible repetitive tasks
- Reproducibility—Precise control of instructions to the computer
Moreover, specific strengths of R as a GIS are:
- R capabilities in data processing and visualization, combined with dedicated packages for spatial data
- A single environment encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.
Nevertheless, there are situations when other tools are needed:
- Interactive editing or georeferencing (but see
- Unique GIS algorithms (3D analysis, label placement, splitting lines at intersections)
- Data that cannot fit in RAM (but R can connect to spatial databases8 and other softwere for working with big data)
The following sections (0.3.2–0.3.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples. We are going to elaborate on most of these packages later on in the book, and many of those examples will become clear.
0.3.2 Input and output of spatial data
Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:
- GDAL/OGR is used for reading/writing vector and raster files, with
- PROJ is used for handling CRS, in both
- Working with specialized formats, e.g., HDF with
gdalUtilsor NetCDF with
sf combined with
RPostgreSQL can be used to read from, and write to, a PostGIS spatial database:
dat ## Simple feature collection with 5 features and 1 field ## geometry type: POINT ## dimension: XY ## bbox: xmin: 35.1397 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013 ## geographic CRS: WGS 84 ## name_lat geometry ## 1 Iris haynei POINT (35.67976 32.77013) ## 2 Iris haynei POINT (35.654 32.74137) ## 3 Iris atrofusca POINT (35.19337 31.44711) ## 4 Iris atrofusca POINT (35.18914 31.51475) ## 5 Iris vartanii POINT (35.1397 31.47415)
sf: Processing Vector Layers
GEOS is used for geometric operations on vector layers with
- Numeric operators—Area, Length, Distance…
- Logical operators—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…
- Geometry generating operators—Centroid, Buffer, Intersection, Union, Difference, Convex-Hull, Simplification…
stars: Processing Rasters
Geometric operations on rasters can be done with package
- Accessing cell values—As matrix / array, Extracting to points / lines / polygons
- Raster algebra—Arithmetic (
-, …), Math (
log10, …), logical (
>, …), summary (
max, …), Masking
- Changing resolution and extent—Cropping, Mosaic, Resampling, Reprojection
- Transformations—Raster <-> Points / Contour lines / Polygons
geosphere: Geometric calculations on longitude/latitude
geosphere implements spherical geometry functions for distance- and direction-related calculations on geographic coordinates (lon-lat).
gstat: Geostatistical Modelling
As mentioned above, R was initially developed for statistical computing (Section 0.2). Accordingly, there is an extensive set of R packages for spatial statistics. For example, package
gstat provides a comprehensive set of functions for univariate and multivariate geostatistics, mainly for the purpose of spatial interpolation:
- Variogram modelling
- Ordinary and universal point or block (co)kriging
We are going to learn about the
gstat package in Chapter 12. An introduction to the package can also be found in Chapter 8 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
spdep: Spatial dependence modelling
Modelling with spatial weights:
- Building neighbor lists and spatial weights
- Tests for spatial autocorrelation for areal data (e.g., Moran’s I)
- Spatial regression models (e.g., SAR, CAR)
spdep package is beyond the scope of this book. An introduction to the package can be found in Chapter 9 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
spatstat: Spatial point pattern analysis
spatstat provides a comprehensive collection of techniques for statistical analysis of spatial point patterns, such as:
- Kernel density estimation
- Detection of clustering using Ripley’s K-function
- Spatial logistic regression
The book Spatial point patterns: methodology and applications with R (Baddeley, Rubak, and Turner 2015) provides a thorough introduction to the subject of point pattern analysis using the
spatstat package. A more brief introduction can also be found in Chapter 7 of Applied Spatial Data Analysis with R (Bivand, Pebesma, and Gomez-Rubio 2013).
osmdata: Access to OpenStreetMap data
osmdata gives access to OpenStreetMap (OSM) data—the most extensive open-source map database in the worls—using the Overpass API10.
library(sf) library(osmdata) q = opq(bbox = "Beer-Sheva, Israel") q = add_osm_feature(q, key = "highway") dat = osmdata_sf(q) lines = dat$osm_lines pol = dat$osm_polygons pol = st_cast(pol, "MULTILINESTRING") pol = st_cast(pol, "LINESTRING") lines = rbind(lines, pol) lines = lines[, "highway"] lines = st_transform(lines, 32636) plot(lines, key.pos = 4, key.width = lcm(4), main = "")
ggplot2 package is one of the most popular packages in R. It provides advanced visualization methods through a well-designed and consistent syntax. The package supports visualization of both vector layers11 and rasters12.
ggplot2 package is highly customizable and capable of producing publication-quality figures and maps as well as original and innovative designs (Figure 0.13). One of its strengths is in easy preparation of “small-multiple”—or facet, in the terminology of
ggplot2—figures (Figure 0.14).
mapview: Web mapping
- Commonly used basemaps
- Color scales and legends
mapview produces an interactive map given a spatial object. The
zcol parameter is used to specify the attribute used for symbology:
0.4 Other materials
This section lists some other resources that are relevant for working with spatial data in R.
- Model-based Geostatistics (Diggle and Ribeiro 2007)
- A Practical Guide to Geostatistical Mapping (Hengl 2009)
- Spatial Data Analysis in Ecology and Agriculture using R (1st ed. 2012, 2nd ed. 2018) (Plant 2018)
- Learning R for Geospatial Analysis (Dorman 2014)
- Applied Spatial Data Analysis with R (1st ed. 2008, 2nd ed. 2013) (Bivand, Pebesma, and Gomez-Rubio 2013)
- Hierarchical Modeling and Analysis for Spatial Data (1st ed. 2003, 2nd ed. 2014) (Banerjee, Carlin, and Gelfand 2014)
- An Introduction to R for Spatial Analysis and Mapping (1st ed. 2015, 2nd ed. 2018) (Brunsdon and Comber 2015)
- Spatial Point Patterns: Methodology and Applications with R (2015) (Baddeley, Rubak, and Turner 2015)
- Displaying Time Series, Spatial, and Space-Time Data with R (1st ed. 2014, 2nd ed. 2018) (Lamigueiro 2014)
- Predictive Soil Mapping with R (Hengl and MacMillan 2019)
- Geocomputation with R (Lovelace, Nowosad, and Muenchow 2019)
- Spatial Data Science (2021?)
0.4.3 Courses and tutorials
- GEOG 4/595: Geographic Data Analysis
- CP6521 Advanced GIS
- ES214 Introduction to GIS and Spatial Analysis
- GEOG 4/590: R for Earth-System Science
- GEOG 4/595: Geographic Data Analysis
- Spatial Data Science with R (Robert J. Hijmans)
- Introduction to Spatial Data Programming with R (this course)
- GISC 422 Spatial Analysis and Modelling
- CASA0005 Geographic Information Systems and Science
- Another list here
- Geospatial Data Science with R
- Data Carpentry Workshops
- GIS in R (Nick Eubank)
- NEON Data Tutorials
- Learn Spatial Analysis (University of Chicago)
- WUR Geoscripting
- Mapping in R
- Spatial Analysis notes
- Classifying Satellite Imagery in R
- Fundamentals of Spatial Analysis in R
- Handling and Analyzing Vector and Raster Data Cubes with R
Baddeley, Adrian, Ege Rubak, and Rolf Turner. 2015. Spatial Point Patterns: Methodology and Applications with R. CRC press.
Banerjee, Sudipto, Bradley P Carlin, and Alan E Gelfand. 2014. Hierarchical Modeling and Analysis for Spatial Data. CRC press.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. https://asdar-book.org/.
Brunsdon, Chris, and Lex Comber. 2015. An Introduction to R for Spatial Analysis and Mapping. Sage.
Diggle, Peter, and Paulo Justiniano Ribeiro. 2007. Model-Based Geostatistics. Springer.
Dorman, Michael. 2014. Learning R for Geospatial Analysis. Packt Publishing Ltd.
Hengl, Tomislav. 2009. “A Practical Guide to Geostatistical Mapping.”
Hengl, Tomislav, and Robert A MacMillan. 2019. Predictive Soil Mapping with R. Lulu. com.
Lamigueiro, Óscar Perpiñán. 2014. Displaying Time Series, Spatial, and Space-Time Data with R. CRC Press.
Lovelace, Robin, Jakub Nowosad, and Jannes Muenchow. 2019. Geocomputation with R. CRC Press.
Plant, Richard E. 2018. Spatial Data Analysis in Ecology and Agriculture Using R. CRC Press.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. springer. https://ggplot2-book.org/.
Comprehensive R Archive Network↩