Working with Spatial Data in Python

BGU, 2023

Author

Michael Dorman

Published

March 21, 2023

1 Introduction

1.1 What is Python and why use it for spatial data?

Python is a general purpose programming language. Python is used for a wide variety of purposes, such as building web applications, scientific computing and machine learning, or scripting in other software.

Python is also widely used for working with spatial data and spatial analysis. The advantages of working with spatial data through programming include:

  • Facilitated automation and reproducibility of our workflows
  • “Forcing” the user to have deeper understanding of the underlying data and the computational algorithms behind GIS workflows

QGIS, a Graphical User Interface (GUI) (left), vs. Python code in a Jupyter notebook, a Command Line Interface (CLI) (right)

Specific advantages of Python for working with spatial data, over alternative programming languages, include:

  • Python and its packages are free and open-source
  • The Python syntax was designed to be clear and straightforward
  • Python is a widespread and extremely popular language, in the GIS as well as other industries
  • Python is also used to automate GIS (and other) software, such as ArcGIS/ArcPro (ArcPy) and QGIS (PyQGIS)
  • Interfaces to deep learning libraries, such as Keras/TensorFlow and PyTorch, are almost exclusively accessed through Python

1.2 What are we going to do in the tutorial?

In this tutorial, we will demonstrate the basic concepts of working with spatial data (vector layers and rasters) in Python. We will see code examples to:

  • Import spatial data into the Python environment
  • Examine and plot the data
  • Perform geospatial calculations
  • Export the results

We are going to start with vector layers (Chapter 2), then move on to rasters (Chapter 3). In terms of Python packages, we are going to use the two most fundamental ones:

The geopandas package extends pandas, a package for working with (non-spatial) tables which we will also use in the tutorial. Both pandas and rasterio depend on numpy, the most important package for working with data in Python, providing the n-dimensional array data structure. Finally, geopandas also depends on shapely, which is a Python interface to GEOS, a widely used computational geometry program powering most open-source GIS software.

Main dependencies between the Python packages in the tutorial

1.3 Reproducing the results

To reproduce the following code examples:

  • Download tutorial.zip and extract the files
  • Open the Anaconda Prompt (Miniconda3) program
  • Navigate to the tutorial directory with the extracted files, using a command such as dir cd C:\Users\dorman\Downloads\tutorial
  • Run the command jupyter notebook to open the Jupyter Notebook interface
  • Open the file vector.ipynb or raster.ipynb
  • Run the code

For more details, see Chapter 4.