Setup: sample data

For more on setting up the environment and sample data, see the preparation document.

Table 0.1: Sample data
Data File(s) Format Source
“Nafot” nafot.shp (+7) Shapefile https://www.gov.il/he/Departments/Guides/info-gis
Railways RAIL_STRATEGIC.shp (+7) Shapefile https://data.gov.il/dataset/rail_strategic
Statistical areas statisticalareas_demography2018.gdb GDB https://www.cbs.gov.il/he/Pages/geo-layers.aspx

The data for this tutorial can be downloaded from:

https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/data.zip

A script with the R code of this document is available here:

https://github.com/michaeldorman/R-Spatial-Workshop-at-CBS-2021/raw/main/main.R

All of the materials are also available on GitHub.

Please feel free to ask questions as we go along!

1 R for Spatial Data Analysis

1.1 Software for analysis of spatial data

Software in general, and software for spatial analysis in particular, is characterized by two types of interfaces:

  • Graphical User Interface (GUI) (Figure 1.1)
  • Command Line Interface (CLI) (Figure 1.2)

In a GUI, our interaction with the computer is restricted to the predefined set of input elements, such as buttons, menus, and dialog boxes. In a CLI, we interact with the computer by writing code, which means that our instructions are practically unconstraned. In other words, with a CLI, we can give the computer specific instructions to do anything we want.

R, which we talk about today, is an example of CLI software for working with (among other things) spatial data.

**QGIS**, an example of Graphical User Interface (GUI) software

Figure 1.1: QGIS, an example of Graphical User Interface (GUI) software

**R**, an example of Command Line Interface (CLI) software

Figure 1.2: R, an example of Command Line Interface (CLI) software

1.2 What is R?

R is a programming language and environment, originally developed for statistical computing and graphics. Notable advantages of R are that it is a full-featured programming language, yet customized for working with data, relatively simple and has a huge collection of ~16,000 packages in the official repository from various areas of interest.

Over time, there was an increasing number of contributed packages for handling and analyzing spatial data in R. Today, spatial analysis is a major functionality in R. As of October 2020, there are ~185 packages specifically addressing spatial analysis in R, and many more are indirectly related to spatial data.

Books on Spatial Data Analysis with R

Figure 1.3: Books on Spatial Data Analysis with R

1.3 History of spatial analysis in R

Some important events in the history of spatial analysis support in R are summarized in Table 1.1.

Table 1.1: Significant events in the history of R-spatial
Year Event
pre-2003 Variable and incomplete approaches (MASS, spatstat, maptools, geoR, splancs, gstat, …)
2003 Consensus that a package defining standard data structures should be useful; rgdal released on CRAN
2005 sp released on CRAN; sp support in rgdal
2008 Applied Spatial Data Analysis with R, 1st ed.
2010 raster released on CRAN
2011 rgeos released on CRAN
2013 Applied Spatial Data Analysis with R, 2nd ed.
2016 sf released on CRAN (Section 2.1)
2018 stars released on CRAN
2019 Geocomputation with R (https://geocompr.robinlovelace.net/)
2021(?) Spatial Data Science (https://www.r-spatial.org/book/)

1.4 R as a GIS?

A question that arises, at this point, is: can R be used as a Geographic Information System (GIS), or as a comprehensive toolbox for doing spatial analysis? The answer is definitely yes. Moreover, R has some important advantages over traditional approaches to GIS, i.e., software with GUIs such as ArcGIS or QGIS.

General advantages of Command Line Interface (CLI) software include:

  • Automation—Doing otherwise unfeasible repetitive tasks
  • Reproducibility—Precise control of instructions to the computer

Moreover, specific strengths of R as a GIS are:

  • R capabilities in data processing, statistics, and visualization, combined with dedicated packages for spatial data
  • A single environment encompassing all analysis aspects—acquiring data, computation, statistics, visualization, Web, etc.

Nevertheless, there are situations when other tools are needed:

  • Interactive editing or georeferencing (but see mapedit package)
  • Unique GIS algorithms (3D analysis, label placement, splitting lines at intersections)
  • Data that cannot fit in RAM (but R can connect to spatial databases1 and other softwere for working with big data)

The following sections (1.51.11) highlight some of the capabilities of spatial data analysis packages in R, through short examples.

1.5 sf and stars

Reading spatial layers from a file into an R data structure, or writing the R data structure into a file, are handled by external libraries:

  • GDAL/OGR is used for reading/writing vector and raster files, with sf and stars
  • PROJ is used for handling Coordinate Reference Systems (CRS), in both sf and stars

1.6 sf: Vector Layers

GEOS is used for geometric operations on vector layers with sf:

  • Numeric operators—Area, Length, Distance…
  • Logical operators—Contains, Within, Within distance, Crosses, Overlaps, Equals, Intersects, Disjoint, Touches…
  • Geometry generating operators—Centroid, Buffer (Figure 1.4), Intersection, Union, Difference, Convex-Hull, Simplification…
Buffer function

Figure 1.4: Buffer function

1.7 stars: Rasters

Geometric operations on rasters can be done with package stars:

  • Accessing cell values—As matrix / array, Extracting to points / lines / polygons
  • Raster algebra—Arithmetic (+, -, …), Math (sqrt, log10, …), logical (!, ==, >, …), summary (mean, max, …), Masking
  • Changing resolution and extent—Cropping, Mosaic, Resampling, Reprojection (Figure 1.5)
  • Transformations—Raster <-> Points / Contour lines / Polygons
Reprojection of the MODIS NDVI raster from Sinusoidal (left) to ITM (right)Reprojection of the MODIS NDVI raster from Sinusoidal (left) to ITM (right)

Figure 1.5: Reprojection of the MODIS NDVI raster from Sinusoidal (left) to ITM (right)

1.8 gstat: Interpolation

Univariate and multivariate geostatistics:

  • Variogram modelling
  • Ordinary and universal point or block (co)kriging (Figure 1.6)
  • Cross-validation
Predicted Zinc concentration, using Ordinary Kriging

Figure 1.6: Predicted Zinc concentration, using Ordinary Kriging

1.9 spdep: Spatial dependence

Modelling with spatial weights:

  • Building neighbor lists (Figure 1.7) and spatial weights
  • Tests for spatial autocorrelation for areal data (e.g. Moran’s I)
  • Spatial regression models (e.g. SAR, CAR)
Neighbors list based on regions with contiguous boundaries

Figure 1.7: Neighbors list based on regions with contiguous boundaries

1.10 spatstat: Point patterns

Techniques for statistical analysis of spatial point patterns (Figure 1.8), such as:

  • Kernel density estimation
  • Detection of clustering using Ripley’s K-function
  • Spatial logistic regression
Distance map for the Biological Cells point pattern dataset

Figure 1.8: Distance map for the Biological Cells point pattern dataset

1.11 RPostgreSQL: PostGIS

When:

  • working with big spatial data, and/or
  • when data are collaboratively prepared by many users (e.g., in a large organization),

we may want to combine R with a (spatial) database.

Package sf (Section 2.1) combined with RPostgreSQL can be used to read from, and write to, a PostGIS spatial database. First, we need to create a connection object:

library(sf)
library(RPostgreSQL)

con = dbConnect(
  PostgreSQL(),
  dbname = "gisdb",
  host = "159.89.13.241",
  port = 5432,
  user = "geobgu",
  password = "*******"
)

Then, we can read or write to the database, just like from a file, using the st_read function (Section 2.3):

st_read(con, query = "SELECT name_lat, geometry FROM plants LIMIT 3;")
## Loading required package: DBI
## Simple feature collection with 3 features and 1 field
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 35.19337 ymin: 31.44711 xmax: 35.67976 ymax: 32.77013
## geographic CRS: WGS 84
##         name_lat                  geometry
## 1    Iris haynei POINT (35.67976 32.77013)
## 2    Iris haynei   POINT (35.654 32.74137)
## 3 Iris atrofusca POINT (35.19337 31.44711)
dbDisconnect(con)
## [1] TRUE

1.12 Other examples

2 Spatial data structures

2.1 The sf package

The sf package (Figure 2.1), released in 2016, is a newer package for working with vector layers in R, which we are going to use in this tutorial. In recent years, sf has become the standard package for working with vector data in R, practically replacing sp, rgdal, and rgeos.

Pebesma, 2018, The R Journal (https://journal.r-project.org/archive/2018-1/)

Figure 2.1: Pebesma, 2018, The R Journal (https://journal.r-project.org/archive/2018-1/)

One of the important innovations in sf is a complete implementation of the Simple Features standard. Since 2003, Simple Features been widely implemented in spatial databases (such as PostGIS), commercial GIS (e.g., ESRI ArcGIS) and forms the vector data basis for libraries such as GDAL. The Simple Features standard defines several types of geometries, of which seven are most commonly used in the world of GIS and spatial data analysis (Figure 2.2). When working with spatial databases, Simple Features are commonly specified as Well Known Text (WKT).

Seven Simple Feature geometry types most commonly used in GIS (see also: https://r-spatial.github.io/sf/articles/sf1.html)

Figure 2.2: Seven Simple Feature geometry types most commonly used in GIS (see also: https://r-spatial.github.io/sf/articles/sf1.html)

The sf package depends on several external software components (installed along with the R package2), most importantly GDAL, GEOS and PROJ (Figure 2.3). These well-tested and popular open-source components are common to numerous open-source and commercial software for spatial analysis, such as QGIS and PostGIS.