Vector layers (geopandas)

Last updated: 2022-06-26 00:31:51

Introduction

In the previous chapter, we worked with individual geometries using the shapely package (Geometries (shapely)). In GIS paractice, however, we usually work with vector layers, rather than individual geometries. Vector layers can be thought of:

  • a collection of (one or more) geometries,

  • associated with a Coordinate Reference System (CRS), and

  • associated with (zero or more) non-spatial attributes.

In this chapter, we introduce the geopandas package for working with vector layers. First, we are going to see how a vector layer can be created from scratch (see Creating a GeoDataFrame), using shapely geometries, to understand the relation between shapely geometries and geopandas data structures, namely:

  • GeoSeries—A geometry column, i.e., a collection of geometries along with a CRS definition

  • GeoDataFrame—A vector layer

Then, we go on to cover some of the basic operations and workflows with vector layers using the geopandas package:

In the next chapter, we are going to cover spatial operations with geopandas, such as calculating new geometries (buffers, etc.) and spatial join (see Geometric operations).

What is geopandas?

geopandas, as the name suggests, is an extension of pandas that enables working with spatial (geographic) data, specifically with vector layers. To facilitate the representation of spatial data, geopandas defines two data structures, which are extensions of the analogous data structures from pandas for non-spatial data:

  • GeoSeries—An extension of a Series (see Creating a Series) for storing a sequence of geometries and a Coordinate Reference System (CRS) definition, whereas each geometry is a shapely object (see Geometries (shapely))

  • GeoDataFrame—An extension of a DataFrame (see Creating a DataFrame), where one of the columns is a GeoSeries

As will be demostrated in a moment (see Creating a GeoDataFrame), a GeoDataFrame, representing a vector layer, is thus a combination (Fig. 47) of:

  • A GeoSeries, which holds the geometries in a special “geometry” column

  • Zero or more Series, which represent the non-spatial attributes

Since geopandas is based on pandas, many of the DataFrame workflows and operators we learned about earlier (see Tables (pandas) and Table reshaping and joins) work exactly the same way with a GeoDataFrame. This is especially true for operations that involve just the non-spatial properties, such as filtering by attributes (see Filtering by attributes).

_images/geodataframe.svg

Fig. 47 GeoDataFrame structure (source: https://geopandas.org/getting_started/introduction.html)

The geopandas package depends on several other Python packages, most importantly:

To put it in perspective of other packages, geopandas may be considered a high-level approach for working with vector layers in Python. A lower-level approach is provided through the fiona package, which geopandas depends on. The osgeo package provides (through its ogr object) another, more low-level, approach for vector layer access [Gar16].

Since geopandas is simpler to learn and work with than its alternatives, it is the recommended approach for the purposes in this book. For instance, using fiona and osgeo we typically have to process vector features one-by-one inside a for loop, while in geopandas the entire layer is being “read” and processed at once. Nevertheless, in some situations it is necessary to resort to lower-level approaches. For example, processing features one-by-one is essential when the layer is too big to fit in memory.

Creating from scratch

Creating a GeoSeries

Before starting to work with geopandas, we have to load it. We also load pandas, shapely.geometry, and shapely.wkt, which we are already familiar with, since they are needed for some of the examples:

import pandas as pd
import shapely.geometry
import shapely.wkt
import geopandas as gpd
/home/michael/.local/lib/python3.8/site-packages/geopandas/_compat.py:111: UserWarning: The Shapely GEOS version (3.10.2-CAPI-1.16.0) is incompatible with the GEOS version PyGEOS was compiled with (3.10.1-CAPI-1.16.0). Conversions between both will be slow.
  warnings.warn(

As a first step to learn about geopandas, we are going to create a geometry column (GeoSeries), and then a vector layer (GeoDataFrame) (see Creating a GeoDataFrame) from sratch. This will help us get a better understanding of the structure of a GeoDataFrame (Fig. 47).

Let us start with the geometry column, i.e., a GeoSeries. The geometric component of our layer is going to be a sequence of two "Polygon" geometries, hereby named pol1 and pol2 (see shapely from WKT):

pol1 = shapely.wkt.loads('POLYGON ((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0))')
pol2 = shapely.wkt.loads('POLYGON ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1))')

To create a GeoSeries, we can pass a list of shapely geometries to the gpd.GeoSeries function. The second (optional) argument passed to gpd.GeoSeries is a crs, specifying the Coordinate Reference System (CRS). In this case, we specify the WGS84 geographical CRS, using its EPSG code 4326 (Table 23):

s = gpd.GeoSeries([pol1, pol2], crs=4326)
s
0    POLYGON ((0.00000 0.00000, 0.00...
1    POLYGON ((0.00000 1.00000, 1.00...
dtype: geometry

Creating a GeoDataFrame

A GeoDataFrame can be constructed from a dict representing the columns, one of them containing the geometries, and the other(s) containing non-spatial attributes (if any). The dict values may be Series, GeoSeries, or list data structure. First, let us create the dictionary:

d = {'name': ['a', 'b'], 'geometry': s}
d
{'name': ['a', 'b'],
 'geometry': 0    POLYGON ((0.00000 0.00000, 0.00...
 1    POLYGON ((0.00000 1.00000, 1.00...
 dtype: geometry}

In this particular case, we have:

  • the (mandatory) geometry column, conventionally named 'geometry', specified using a GeoSeries (see Creating a GeoSeries), and

  • one non-spatial attribute, named 'name', specified using a list

Now, we can convert the dict to a GeoDataFrame, using the gpd.GeoDataFrame function:

gdf = gpd.GeoDataFrame(d)
gdf
name geometry
0 a POLYGON ((0.00000 0.00000, 0.00...
1 b POLYGON ((0.00000 1.00000, 1.00...

or in a single step:

gdf = gpd.GeoDataFrame({'name': ['a', 'b'], 'geometry': s})
gdf
name geometry
0 a POLYGON ((0.00000 0.00000, 0.00...
1 b POLYGON ((0.00000 1.00000, 1.00...

We can also use a list of geometries instead of a GeoSeries, in which case the crs can be specified inside gpd.GeoDataFrame:

gdf = gpd.GeoDataFrame({'name': ['a', 'b'], 'geometry': [pol1, pol2]}, crs=4326)
gdf
name geometry
0 a POLYGON ((0.00000 0.00000, 0.00...
1 b POLYGON ((0.00000 1.00000, 1.00...

Note that, the workflow to create a GeoDataFrame is very similar to the way we created a DataFrame with the pd.DataFrame function (see Creating a DataFrame). Basically, the only difference is the presence of the geometry column.

GeoDataFrame structure

Series columns

Let us examine the structure of the gdf object we just created to understand the structure of a GeoDataFrame.

An individual column of a GeoDataFrame can be accessed just like a column of a DataFrame (see Selecting DataFrame columns). For example, the following expression returns the "name" column:

gdf['name']
0    a
1    b
Name: name, dtype: object

as a Series object:

type(gdf['name'])
pandas.core.series.Series

More specifically, 'name' is a Series of type object, since it contains strings.

GeoSeries (geometry) column

The following expression returns the 'geometry' column:

gdf['geometry']
0    POLYGON ((0.00000 0.00000, 0.00...
1    POLYGON ((0.00000 1.00000, 1.00...
Name: geometry, dtype: geometry

As mentioned above, the geometry column is a GeoSeries object:

type(gdf['geometry'])
geopandas.geoseries.GeoSeries

The 'geometry' column is what differentiates a GeoDataFrame from an ordinary DataFrame.

Individual geometries

As evident from the way we constructed the object (see Creating a GeoDataFrame), each “cell” in the geometry column is an individual shapely geometry:

gdf['geometry'].iloc[0]
_images/geopandas1_53_0.svg
gdf['geometry'].iloc[1]
_images/geopandas1_54_0.svg

If necessary, for instance when applying shapely functions (see Points to line (geopandas)), the entire geometry column can be extracted as a list of shapely geometries using to_list (see Series to list):

gdf['geometry'].to_list()
[<shapely.geometry.polygon.Polygon at 0x7f0684e84490>,
 <shapely.geometry.polygon.Polygon at 0x7f0684e84610>]

GeoDataFrame properties

Geometry types

The geometry types (Fig. 37) of the geometries in a GeoDataFrame or a GeoSeries can be obtained using the .geom_type property. The result is a Series of strings, with the geometry type names. For example:

s.geom_type
0    Polygon
1    Polygon
dtype: object
gdf.geom_type
0    Polygon
1    Polygon
dtype: object

The .geom_type property of geopandas thus serves the same purpose as the .geom_type property of shapely geometries (see Geometry type).

Coordinate Reference System (CRS)

In addition to the geometries, a GeoSeries has a property named .crs, which specifies the Coordinate Regerence System (CRS) of the geometries. For example, the geometries in the gdf layer are in the geographic WGS84 CRS, as specified when creating the layer (see Creating a GeoDataFrame):

gdf['geometry'].crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

The .crs property can also be reached from the GeoDataFrame itself:

gdf.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

We are going to elaborate on CRS and modifying them later on (see CRS and reprojection).

Bounds (geopandas)

The .total_bounds, property of a GeoDataFrame contains the bounding box coordinates of all geometries combined, in the form of a numpy array:

gdf.total_bounds
array([ 0. , -1. ,  7.5,  1. ])

For more detailed information, the .bounds property returns the bounds of each geometry, separately, in a DataFrame:

gdf.bounds
minx miny maxx maxy
0 0.0 -1.0 7.5 0.0
1 0.0 -0.5 7.0 1.0

In both cases, bounds are returned in the standard form \((x_{min}\), \(y_{min}\), \(x_{max}\), \(y_{max})\), in agreement with the .bounds property of shapely (see Bounds (shapely)).

Reading vector layers

Usually, we read a vector layer from a file, rather than creating it from scratch (see Creating a GeoDataFrame). The gpd.read_file function can read vector layer files in any of the common formats, such as:

  • Shapefile (.shp)

  • GeoJSON (.geojson)

  • GeoPackage (.gpkg)

  • File Geodatabase (.gdb)

For example, the following expression reads a Shapefile named muni_il.shp into a GeoDataFrame named towns. This is a polygonal layer of towns in Israel:

towns = gpd.read_file('data/muni_il.shp')
towns
Muni_Heb Muni_Eng Sug_Muni ... Shape_Le_1 Shape_Area geometry
0 טירה Tira עירייה ... 19144.409231 1.185151e+07 POLYGON ((196372.646 684381.632...
1 טייבה Taybe עירייה ... 23581.471482 1.890906e+07 POLYGON ((199366.832 687791.637...
2 תמר Tamar מועצה אזורית ... 280045.935257 1.569780e+09 POLYGON ((223717.609 523705.241...
3 שפיר Shafir מועצה אזורית ... 97499.196114 8.177835e+07 POLYGON ((179909.061 607123.385...
4 שער הנגב Sha'ar Hanegev מועצה אזורית ... 124642.108454 1.805272e+08 POLYGON ((169289.462 601344.604...
... ... ... ... ... ... ... ...
422 הגליל העליון Hagalil Ha'elyon מועצה אזורית ... 15953.685374 8.914562e+06 POLYGON ((250468.035 779727.733...
423 הגליל העליון Hagalil Ha'elyon מועצה אזורית ... 7677.174194 1.735902e+06 POLYGON ((251434.056 771376.157...
424 הגליל העליון Hagalil Ha'elyon מועצה אזורית ... 50678.969970 3.626273e+07 POLYGON ((241900.500 776693.188...
425 הגליל העליון Hagalil Ha'elyon מועצה אזורית ... 84551.547966 5.682842e+07 POLYGON ((254712.930 767150.100...
426 ללא שיפוט - נחל יתלה No Jurisdiction - Nahal ytla ללא שיפוט ... 3332.103042 2.231950e+05 POLYGON ((209273.654 635045.173...

427 rows × 23 columns

Let us read another layer, RAIL_STRATEGIC.shp, into a GeoDataFrame named rail. This is a line layer of railway lines in Israel:

rail = gpd.read_file('data/RAIL_STRATEGIC.shp')
rail
UPGRADE SHAPE_LENG SHAPE_Le_1 ... ISACTIVE UPGRAD geometry
0 שדרוג 2040 14489.400742 12379.715332 ... פעיל שדרוג בין 2030 ל-2040 LINESTRING (205530.083 741562.9...
1 שדרוג 2030 2159.344166 2274.111800 ... פעיל שדרוג עד 2030 LINESTRING (181507.598 650706.1...
2 שדרוג 2040 4942.306156 4942.306156 ... לא פעיל שדרוג בין 2030 ל-2040 LINESTRING (189180.643 645433.4...
3 שדרוג 2040 2829.892202 2793.337699 ... פעיל שדרוג בין 2030 ל-2040 LINESTRING (203482.789 744181.5...
4 שדרוג 2040 1976.490872 1960.170882 ... פעיל שדרוג בין 2030 ל-2040 LINESTRING (178574.101 609392.9...
... ... ... ... ... ... ... ...
212 חדש 2030 2174.401983 2174.401983 ... לא פעיל פתיחה עד 2030 LINESTRING (195887.590 675861.9...
213 חדש 2030 0.000000 974.116931 ... לא פעיל פתיחה עד 2030 LINESTRING (183225.361 648648.2...
214 שדרוג 2030 3248.159569 1603.616014 ... פעיל שדרוג עד 2030 LINESTRING (200874.999 746209.3...
215 שדרוג 2030 2611.879099 166.180958 ... פעיל שדרוג עד 2030 LINESTRING (203769.786 744358.6...
216 שדרוג 2030 1302.971893 1284.983680 ... פעיל שדרוג עד 2030 LINESTRING (190553.481 654170.3...

217 rows × 7 columns

Note

Note that a Shapefile consists of several files, such as .shp, .shx, .dbf and .prj, even though we specify just one the .shp file path when reading a Shapefile with gpd.read_file. The same applies to writing a Shapefile (see Writing vector layers).

Plotting (geopandas)

Basic plots

A GeoDataFrame object has a .plot method, which can be used to visually examine the layer we are working with. For example:

gdf.plot();
_images/geopandas1_85_0.png

We can use the color and edgecolor parameters to specify the fill color and border color, respectively:

gdf.plot(color="none", edgecolor="red");
_images/geopandas1_87_0.png

Another useful property is linewidth, to specify line width:

gdf.plot(color='none', linewidth=10);
_images/geopandas1_89_0.png

Let us also plot the towns and rail layers to see what they look like:

towns.plot();
_images/geopandas1_91_0.png
rail.plot();
_images/geopandas1_92_0.png

Setting symbology

By default, all plotted geometries share the same appearance. We can set a symbology, where the appearance corresponds to given attributes values, using the column parameter. For example, the following expression results in a map where town fill colors correspond to the "Machoz" attribute values. We can therefore see the “Machoz” administrative division:

towns.plot(column='Machoz');
_images/geopandas1_95_0.png

The color palette for the symbology can be changed using the cmap (“color map”) argument. Valid palette names can be found in the Choosing Colormaps page of the matplotlib documentation. For example, "Set2" is one of the “Qualitative colormaps”, which is appropriate for categorical variables:

towns.plot(column='Machoz', cmap='Set2');
_images/geopandas1_97_0.png

As another example, the following expression sets a symbology corresponding to the continuous "AreaSQM" (town area in square meters) columns. The argument legend=True adds a legend:

towns.plot(column='AreaSQM', cmap='Reds', legend=True);
_images/geopandas1_99_0.png

Interactive maps

geopandas also provides the .explore method, which facilitates interactive exploration of the vector layers we are working with. The .explore method takes many of the same arguments as .plot (such as column and cmap), but also some specific ones (such as style_kwds). See the documentation for details.

For example, the following expression creates an interactive map displaying the rail layer, colored according to status (active or not). Note that hovering over a feature with the mouse shows a tooltip with the attribute values (similarly to “identify” tools in GIS software).

rail.explore(column='ISACTIVE', cmap='RdYlBu', style_kwds={'weight':8, 'opacity':0.8})
Make this Notebook Trusted to load map: File -> Trust Notebook