Working with Spatial Data in Python

OpenGeoHub Summer School Poznań, 2023

Author

Michael Dorman

Published

August 25, 2023

1 Introduction

1.1 What is Python and why use it for spatial data?

Python is a general-purpose programming language. Python is used for a wide variety of purposes, such as building web applications, scientific computing and machine learning, or scripting in other software.

Python is also widely used for working with spatial data and spatial analysis. The advantages of working with spatial data through programming (Figure 1.1) include:

  • Facilitated automation and reproducibility of our workflows
  • “Forcing” the user to have deeper understanding of the underlying data and the computational algorithms behind GIS workflows

Figure 1.1: QGIS, a Graphical User Interface (GUI) (left), vs. Python code in a Jupyter notebook, a Command Line Interface (CLI) (right)

Specific advantages of Python for working with spatial data, over alternative programming languages, include:

  • Python is a widespread and extremely popular language, in the GIS as well as other industries
  • The Python syntax was designed to be clear and straightforward
  • Python is also used to automate GIS (and other) software, such as ArcGIS/ArcPro (ArcPy) and QGIS (PyQGIS)
  • Deep learning libraries, such as Keras/TensorFlow and PyTorch, are almost exclusively accessed through Python

1.2 What are we going to do in the tutorial?

In this tutorial, we will demonstrate the basic concepts of working with spatial data—vector layers and rasters—in Python. We will see code examples to:

  • Import spatial data into the Python environment
  • Examine and plot the data
  • Perform geospatial calculations
  • Export the results

We are going to start with vector layers (Chapter 2), then move on to rasters (Chapter 3). In terms of Python packages, we are going to use the two most fundamental ones:

The geopandas package extends pandas, a package for working with (non-spatial) tables which we will also use in the tutorial (Figure 1.2). Both pandas and rasterio depend on numpy, the most important package for working with data in Python, providing the n-dimensional array data structure. Finally, geopandas also depends on shapely, which is a Python interface to GEOS, a widely used computational geometry program powering most open-source GIS software.

Figure 1.2: Main dependencies between the Python packages in the tutorial

1.3 Reproducing the results

To reproduce the following code examples:

  • Download tutorial.zip and extract the files
  • Open the Anaconda Prompt (Miniconda3) program, or any other Python environment with the notebook, geopandas and rasterio packages installed
  • Navigate to the tutorial directory with the extracted files, using a command such as cd C:\Users\dorman\Downloads\tutorial
  • Run the command jupyter notebook to open the Jupyter Notebook interface
  • Open the file vector.ipynb or raster.ipynb
  • Run the code

For more details, see Chapter 5.