Working with Spatial Data in Python
OpenGeoHub Summer School Poznań, 2023
1 Introduction
1.1 What is Python and why use it for spatial data?
Python is a general-purpose programming language. Python is used for a wide variety of purposes, such as building web applications, scientific computing and machine learning, or scripting in other software.
Python is also widely used for working with spatial data and spatial analysis. The advantages of working with spatial data through programming (Figure 1.1) include:
- Facilitated automation and reproducibility of our workflows
- “Forcing” the user to have deeper understanding of the underlying data and the computational algorithms behind GIS workflows
Specific advantages of Python for working with spatial data, over alternative programming languages, include:
- Python is a widespread and extremely popular language, in the GIS as well as other industries
- The Python syntax was designed to be clear and straightforward
- Python is also used to automate GIS (and other) software, such as ArcGIS/ArcPro (ArcPy) and QGIS (PyQGIS)
- Deep learning libraries, such as Keras/TensorFlow and PyTorch, are almost exclusively accessed through Python
1.2 What are we going to do in the tutorial?
In this tutorial, we will demonstrate the basic concepts of working with spatial data—vector layers and rasters—in Python. We will see code examples to:
- Import spatial data into the Python environment
- Examine and plot the data
- Perform geospatial calculations
- Export the results
We are going to start with vector layers (Chapter 2), then move on to rasters (Chapter 3). In terms of Python packages, we are going to use the two most fundamental ones:
The geopandas
package extends pandas
, a package for working with (non-spatial) tables which we will also use in the tutorial (Figure 1.2). Both pandas
and rasterio
depend on numpy
, the most important package for working with data in Python, providing the n-dimensional array data structure. Finally, geopandas
also depends on shapely
, which is a Python interface to GEOS, a widely used computational geometry program powering most open-source GIS software.
1.3 Reproducing the results
To reproduce the following code examples:
- Download tutorial.zip and extract the files
- Open the Anaconda Prompt (Miniconda3) program, or any other Python environment with the
notebook
,geopandas
andrasterio
packages installed - Navigate to the
tutorial
directory with the extracted files, using a command such ascd C:\Users\dorman\Downloads\tutorial
- Run the command
jupyter notebook
to open the Jupyter Notebook interface - Open the file
vector.ipynb
orraster.ipynb
- Run the code
For more details, see Chapter 5.