# Vector layers (`geopandas`

)#

```
Last updated: 2023-05-27 16:40:43
```

## Introduction#

In the previous chapter, we worked with individual geometries using the `shapely`

package (Geometries (shapely)). In GIS paractice, however, we usually work with *vector layers*, rather than individual geometries. Vector layers can be thought of:

a collection of (one or more) geometries,

associated with a Coordinate Reference System (CRS), and

associated with (zero or more) non-spatial attributes.

In this chapter, we introduce the `geopandas`

package for working with vector layers. First, we are going to see how a vector layer can be created from scratch (see Creating a GeoDataFrame), using `shapely`

geometries, to understand the relation between `shapely`

geometries and `geopandas`

data structures, namely:

`GeoSeries`

—A geometry column, i.e., a collection of geometries along with a CRS definition`GeoDataFrame`

—A vector layer

Then, we go on to cover some of the basic operations and workflows with vector layers using the `geopandas`

package:

Reading vector layers—Importing a vector layer from a file, such as a Shapefile

Plotting (geopandas)—Plotting vector layers

Filtering by attributes—Selecting features by attribute

Table to point layer—Creating a point layer from a table

Points to line (geopandas)—Converting a point layer to a line layer

Writing vector layers—Writing a vector layer to file

In the next chapter, we are going to cover *spatial* operations with `geopandas`

, such as calculating new geometries (buffers, etc.) and spatial join (see Geometric operations).

## What is `geopandas`

?#

`geopandas`

, as the name suggests, is an extension of `pandas`

that enables working with spatial (geographic) data, specifically with *vector layers*. To facilitate the representation of spatial data, `geopandas`

defines two data structures, which are extensions of the analogous data structures from `pandas`

for non-spatial data:

`GeoSeries`

—An extension of a`Series`

(see Creating a Series) for storing a sequence of geometries and a Coordinate Reference System (CRS) definition, whereas each geometry is a`shapely`

object (see Geometries (shapely))`GeoDataFrame`

—An extension of a`DataFrame`

(see Creating a DataFrame), where one of the columns is a`GeoSeries`

As will be demostrated in a moment (see Creating a GeoDataFrame), a `GeoDataFrame`

, representing a *vector layer*, is thus a combination (Fig. 28) of:

A

`GeoSeries`

, which holds the geometries in a special “geometry” columnZero or more

`Series`

, which represent the non-spatial attributes

Since `geopandas`

is based on `pandas`

, many of the `DataFrame`

workflows and operators we learned about earlier (see Tables (pandas) and Table reshaping and joins) work exactly the same way with a `GeoDataFrame`

. This is especially true for operations that involve just the non-spatial properties, such as filtering by attributes (see Filtering by attributes).

The `geopandas`

package depends on several other Python packages, most importantly:

`numpy`

`pandas`

`shapely`

—Interface to GEOS (see Geometries (shapely))`fiona`

—Interface to GDAL`pyproj`

—Interface to PROJ

To put it in perspective of other packages, `geopandas`

may be considered a high-level approach for working with vector layers in Python. A lower-level approach is provided through the `fiona`

package, which `geopandas`

depends on. The `osgeo`

package provides (through its `ogr`

object) another, more low-level, approach for vector layer access [].

Since `geopandas`

is simpler to learn and work with than its alternatives, it is the recommended approach for the purposes in this book. For instance, using `fiona`

and `osgeo`

we typically have to process vector features one-by-one inside a `for`

loop, while in `geopandas`

the entire layer is being ‘read’ and processed at once. Nevertheless, in some situations it is necessary to resort to lower-level approaches. For example, processing features one-by-one is essential when the layer is too big to fit in memory.

## Creating from scratch#

### Creating a `GeoSeries`

#

Before starting to work with `geopandas`

, we have to load it. We also load `pandas`

, `shapely.geometry`

, and `shapely.wkt`

, which we are already familiar with, since they are needed for some of the examples:

```
import pandas as pd
import shapely.geometry
import shapely.wkt
import geopandas as gpd
```

As a first step to learn about `geopandas`

, we are going to create a geometry column (`GeoSeries`

), and then a vector layer (`GeoDataFrame`

) (see Creating a GeoDataFrame) from sratch. This will help us get a better understanding of the structure of a `GeoDataFrame`

(Fig. 28).

Let us start with the geometry column, i.e., a `GeoSeries`

. The geometric component of our layer is going to be a sequence of two `'Polygon"`

geometries, hereby named `pol1`

and `pol2`

(see shapely from WKT):

```
pol1 = shapely.wkt.loads('POLYGON ((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0))')
pol2 = shapely.wkt.loads('POLYGON ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1))')
```

To create a `GeoSeries`

, we can pass a `list`

of `shapely`

geometries to the `gpd.GeoSeries`

function. The second (optional) argument passed to `gpd.GeoSeries`

is a `crs`

, specifying the Coordinate Reference System (CRS). In this case, we specify the WGS84 geographical CRS, using its EPSG code `4326`

(Table 23):

```
s = gpd.GeoSeries([pol1, pol2], crs=4326)
s
```

```
0 POLYGON ((0.00000 0.00000, 0.00...
1 POLYGON ((0.00000 1.00000, 1.00...
dtype: geometry
```

### Creating a `GeoDataFrame`

#

A `GeoDataFrame`

can be constructed from a `dict`

representing the columns, one of them containing the geometries, and the other(s) containing non-spatial attributes (if any). The `dict`

values may be `Series`

, `GeoSeries`

, or `list`

data structure. First, let us create the dictionary:

```
d = {'name': ['a', 'b'], 'geometry': s}
d
```

```
{'name': ['a', 'b'],
'geometry': 0 POLYGON ((0.00000 0.00000, 0.00...
1 POLYGON ((0.00000 1.00000, 1.00...
dtype: geometry}
```

In this particular case, we have:

the (mandatory) geometry column, conventionally named

`'geometry'`

, specified using a`GeoSeries`

(see Creating a GeoSeries), andone non-spatial attribute, named

`'name'`

, specified using a`list`

Now, we can convert the `dict`

to a `GeoDataFrame`

, using the `gpd.GeoDataFrame`

function:

```
gdf = gpd.GeoDataFrame(d)
gdf
```

name | geometry | |
---|---|---|

0 | a | POLYGON ((0.00000 0.00000, 0.00... |

1 | b | POLYGON ((0.00000 1.00000, 1.00... |

or in a single step:

```
gdf = gpd.GeoDataFrame({'name': ['a', 'b'], 'geometry': s})
gdf
```

name | geometry | |
---|---|---|

0 | a | POLYGON ((0.00000 0.00000, 0.00... |

1 | b | POLYGON ((0.00000 1.00000, 1.00... |

We can also use a `list`

of geometries instead of a `GeoSeries`

, in which case the `crs`

can be specified inside `gpd.GeoDataFrame`

:

```
gdf = gpd.GeoDataFrame({'name': ['a', 'b'], 'geometry': [pol1, pol2]}, crs=4326)
gdf
```

name | geometry | |
---|---|---|

0 | a | POLYGON ((0.00000 0.00000, 0.00... |

1 | b | POLYGON ((0.00000 1.00000, 1.00... |

Note that, the workflow to create a `GeoDataFrame`

is very similar to the way we created a `DataFrame`

with the `pd.DataFrame`

function (see Creating a DataFrame). Basically, the only difference is the presence of the geometry column.

`GeoDataFrame`

structure#

`Series`

columns#

Let us examine the structure of the `gdf`

object we just created to understand the structure of a `GeoDataFrame`

.

An individual column of a `GeoDataFrame`

can be accessed just like a column of a `DataFrame`

(see Selecting DataFrame columns). For example, the following expression returns the `"name"`

column:

```
gdf['name']
```

```
0 a
1 b
Name: name, dtype: object
```

as a `Series`

object:

```
type(gdf['name'])
```

```
pandas.core.series.Series
```

More specifically, `'name'`

is a `Series`

of type `object`

, since it contains strings.

`GeoSeries`

(geometry) column#

The following expression returns the `'geometry'`

column:

```
gdf['geometry']
```

```
0 POLYGON ((0.00000 0.00000, 0.00...
1 POLYGON ((0.00000 1.00000, 1.00...
Name: geometry, dtype: geometry
```

As mentioned above, the geometry column is a `GeoSeries`

object:

```
type(gdf['geometry'])
```

```
geopandas.geoseries.GeoSeries
```

The `'geometry'`

column is what differentiates a `GeoDataFrame`

from an ordinary `DataFrame`

.

### Individual geometries#

As evident from the way we constructed the object (see Creating a GeoDataFrame), each “cell” in the geometry column is an individual `shapely`

geometry:

```
gdf['geometry'].iloc[0]
```

```
gdf['geometry'].iloc[1]
```

If necessary, for instance when applying `shapely`

functions (see Points to line (geopandas)), the entire geometry column can be extracted as a `list`

of `shapely`

geometries using `to_list`

(see Series to list):

```
gdf['geometry'].to_list()
```

```
[<POLYGON ((0 0, 0 -1, 7.5 -1, 7.5 0, 0 0))>,
<POLYGON ((0 1, 1 0, 2 0.5, 3 0, 4 0, 5 0.5, 6 -0.5, 7 -0.5, 7 1, 0 1))>]
```

`GeoDataFrame`

properties#

### Geometry types#

The geometry types (Fig. 27) of the geometries in a `GeoDataFrame`

or a `GeoSeries`

can be obtained using the `.geom_type`

property. The result is a `Series`

of strings, with the geometry type names. For example:

```
s.geom_type
```

```
0 Polygon
1 Polygon
dtype: object
```

```
gdf.geom_type
```

```
0 Polygon
1 Polygon
dtype: object
```

The `.geom_type`

property of `geopandas`

thus serves the same purpose as the `.geom_type`

property of `shapely`

geometries (see Geometry type).

### Coordinate Reference System (CRS)#

In addition to the geometries, a `GeoSeries`

has a property named `.crs`

, which specifies the Coordinate Regerence System (CRS) of the geometries. For example, the geometries in the `gdf`

layer are in the geographic WGS84 CRS, as specified when creating the layer (see Creating a GeoDataFrame):

```
gdf['geometry'].crs
```

```
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
```

The `.crs`

property can also be reached from the `GeoDataFrame`

itself:

```
gdf.crs
```

```
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
```

We are going to elaborate on CRS and modifying them later on (see CRS and reprojection).

### Bounds (`geopandas`

)#

The `.total_bounds`

, property of a `GeoDataFrame`

contains the bounding box coordinates of all geometries combined, in the form of a `numpy`

array:

```
gdf.total_bounds
```

```
array([ 0. , -1. , 7.5, 1. ])
```

For more detailed information, the `.bounds`

property returns the bounds of each geometry, separately, in a `DataFrame`

:

```
gdf.bounds
```

minx | miny | maxx | maxy | |
---|---|---|---|---|

0 | 0.0 | -1.0 | 7.5 | 0.0 |

1 | 0.0 | -0.5 | 7.0 | 1.0 |

In both cases, bounds are returned in the standard form \((x_{min}\), \(y_{min}\), \(x_{max}\), \(y_{max})\), in agreement with the `.bounds`

property of `shapely`

(see Bounds (shapely)).

## Reading vector layers#

Usually, we read a vector layer from a file, rather than creating it from scratch (see Creating a GeoDataFrame). The `gpd.read_file`

function can read vector layer files in any of the common formats, such as:

Shapefile (

`.shp`

)GeoJSON (

`.geojson`

)GeoPackage (

`.gpkg`

)File Geodatabase (

`.gdb`

)

For example, the following expression reads a Shapefile named `muni_il.shp`

into a `GeoDataFrame`

named `towns`

. This is a polygonal layer of towns in Israel. The additional `encoding='utf-8'`

argument makes the Hebrew text correctly interpreted:

```
towns = gpd.read_file('data/muni_il.shp', encoding='utf-8')
towns
```

Muni_Heb | Muni_Eng | Sug_Muni | ... | T10 | T11 | geometry | |
---|---|---|---|---|---|---|---|

0 | ללא שיפוט - אזור מעיליא | No Jurisdiction - Mi'ilya Area | ללא שיפוט | ... | NaN | NaN | POLYGON Z ((225369.655 770523.6... |

1 | תמר | Tamar | מועצה אזורית | ... | NaN | NaN | POLYGON Z ((206899.135 552967.8... |

2 | שהם | Shoham | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((194329.250 655299.1... |

3 | שהם | Shoham | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((196236.573 657835.0... |

4 | ראש פינה | Rosh Pina | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((255150.135 764764.6... |

... | ... | ... | ... | ... | ... | ... | ... |

405 | אופקים | Ofakim | עירייה | ... | NaN | NaN | POLYGON Z ((164806.146 577898.8... |

406 | אבן יהודה | Even Yehuda | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((189803.359 684152.9... |

407 | אבו סנאן | Abu Snan | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((212294.953 763168.8... |

408 | אבו גוש | Abu Gosh (Abu Rosh) | מועצה מקומית | ... | NaN | NaN | POLYGON Z ((209066.649 635655.2... |

409 | שדות נגב | Sdot Negev | מועצה אזורית | ... | NaN | NaN | POLYGON Z ((162082.027 592043.1... |

410 rows × 38 columns

Let us read another layer, `RAIL_STRATEGIC.shp`

, into a `GeoDataFrame`

named `rail`

. This is a line layer of railway lines in Israel:

```
rail = gpd.read_file('data/RAIL_STRATEGIC.shp')
rail
```

UPGRADE | SHAPE_LENG | SHAPE_Le_1 | ... | ISACTIVE | UPGRAD | geometry | |
---|---|---|---|---|---|---|---|

0 | שדרוג 2040 | 14489.400742 | 12379.715332 | ... | פעיל | שדרוג בין 2030 ל-2040 | LINESTRING (205530.083 741562.9... |

1 | שדרוג 2030 | 2159.344166 | 2274.111800 | ... | פעיל | שדרוג עד 2030 | LINESTRING (181507.598 650706.1... |

2 | שדרוג 2040 | 4942.306156 | 4942.306156 | ... | לא פעיל | שדרוג בין 2030 ל-2040 | LINESTRING (189180.643 645433.4... |

3 | שדרוג 2040 | 2829.892202 | 2793.337699 | ... | פעיל | שדרוג בין 2030 ל-2040 | LINESTRING (203482.789 744181.5... |

4 | שדרוג 2040 | 1976.490872 | 1960.170882 | ... | פעיל | שדרוג בין 2030 ל-2040 | LINESTRING (178574.101 609392.9... |

... | ... | ... | ... | ... | ... | ... | ... |

212 | חדש 2030 | 2174.401983 | 2174.401983 | ... | לא פעיל | פתיחה עד 2030 | LINESTRING (195887.590 675861.9... |

213 | חדש 2030 | 0.000000 | 974.116931 | ... | לא פעיל | פתיחה עד 2030 | LINESTRING (183225.361 648648.2... |

214 | שדרוג 2030 | 3248.159569 | 1603.616014 | ... | פעיל | שדרוג עד 2030 | LINESTRING (200874.999 746209.3... |

215 | שדרוג 2030 | 2611.879099 | 166.180958 | ... | פעיל | שדרוג עד 2030 | LINESTRING (203769.786 744358.6... |

216 | שדרוג 2030 | 1302.971893 | 1284.983680 | ... | פעיל | שדרוג עד 2030 | LINESTRING (190553.481 654170.3... |

217 rows × 7 columns

Note

Note that a Shapefile consists of several files, such as `.shp`

, `.shx`

, `.dbf`

and `.prj`

, even though we specify just one the `.shp`

file path when reading a Shapefile with `gpd.read_file`

. The same applies to writing a Shapefile (see Writing vector layers).

## Plotting (`geopandas`

)#

### Basic plots#

A `GeoDataFrame`

object has a `.plot`

method, which can be used to visually examine the layer we are working with. For example:

```
gdf.plot();
```

We can use the `color`

and `edgecolor`

parameters to specify the fill color and border color, respectively:

```
gdf.plot(color="none", edgecolor="red");
```

Another useful property is `linewidth`

, to specify line width:

```
gdf.plot(color='none', linewidth=10);
```

Let us also plot the `towns`

and `rail`

layers to see what they look like:

```
towns.plot();
```

```
rail.plot();
```

### Setting symbology#

By default, all plotted geometries share the same appearance. We can set a symbology, where the appearance corresponds to given attributes values, using the `column`

parameter. For example, the following expression results in a map where town fill colors correspond to the `"Machoz"`

attribute values. We can therefore see the “Machoz” administrative division:

```
towns.plot(column='Machoz');
```

The color palette for the symbology can be changed using the `cmap`

(“color map”) argument. Valid palette names can be found in the Choosing Colormaps page of the `matplotlib`

documentation. For example, `"Set2"`

is one of the “Qualitative colormaps”, which is appropriate for categorical variables:

```
towns.plot(column='Machoz', cmap='Set2');
```

As another example, the following expression sets a symbology corresponding to the continuous `"Shape_Area"`

(town area in square meters) columns. The argument `legend=True`

adds a legend:

```
towns.plot(column='Shape_Area', cmap='Reds', legend=True);
```

### Interactive maps#

`geopandas`

also provides the `.explore`

method, which facilitates interactive exploration of the vector layers we are working with. The `.explore`

method takes many of the same arguments as `.plot`

(such as `column`

and `cmap`

), but also some specific ones (such as `style_kwds`

). See the documentation for details.

For example, the following expression creates an interactive map displaying the `rail`

layer, colored according to status (active or not). Note that hovering over a feature with the mouse shows a tooltip with the attribute values (similarly to “identify” tools in GIS software).

```
rail.explore(column='ISACTIVE', cmap='RdYlBu', style_kwds={'weight':8, 'opacity':0.8})
```