Geometric operations#

Last updated: 2024-05-12 10:19:36

Introduction#

In the previous chapter we introduced the geopandas package, focusing on non-spatial operations such as subsetting by attributes (see Filtering by attributes), as well as geometry-related transformations (see Table to point layer and Points to line (geopandas)).

In this chapter, we go further into operations that involve the geometric component of one or more GeoDataFrame, using the geopandas package. This group of operations includes the following standard spatial analysis functions:

By the end of this chapter, you will have an overview of all common worflows in spatial analysis of vector layers.

Sample data#

Overview#

First, let’s prepare the environment to demostrate geometric operations using geopandas. First, we load the pandas and geopandas packages:

import pandas as pd
import geopandas as gpd

Next, let’s load three layers, subset the relevant columns, and examine them visually. The first two (towns and rail) are familiar from the previous chapter (see Reading vector layers), while the third one (stations) is new:

  • towns—A polygonal layer of towns (see Towns)

  • rail—A line layer of railway lines (see Railway lines)

  • stations—A point layer of railway stations (see Railway stations)

Towns#

The polygonal towns layer is imported from the Shapefile named 'muni_il.shp'. Then we select the attributes of interest (plus the 'geometry' column):

towns = gpd.read_file('data/muni_il.shp', encoding='utf-8')
towns = towns[['Muni_Heb', 'Machoz', 'Shape_Area', 'geometry']]
towns
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
/home/michael/.local/lib/python3.10/site-packages/pyogrio/geopandas.py:49: FutureWarning: errors='ignore' is deprecated and will raise in a future version. Use to_datetime without passing `errors` and catch exceptions explicitly instead
  res = pd.to_datetime(ser, **datetime_kwargs)
Muni_Heb Machoz Shape_Area geometry
0 ללא שיפוט - אזור מעיליא צפון 2.001376e+05 POLYGON Z ((225369.655 770523.6...
1 תמר דרום 1.597418e+09 POLYGON Z ((206899.135 552967.8...
2 שהם מרכז 6.177438e+06 POLYGON Z ((194329.25 655299.13...
3 שהם מרכז 9.850450e+05 POLYGON Z ((196236.573 657835.0...
4 ראש פינה צפון 6.104233e+04 POLYGON Z ((255150.135 764764.6...
... ... ... ... ...
405 אופקים דרום 1.635240e+07 POLYGON Z ((164806.146 577898.8...
406 אבן יהודה מרכז 8.141962e+06 POLYGON Z ((189803.359 684152.9...
407 אבו סנאן צפון 6.455340e+06 POLYGON Z ((212294.953 763168.8...
408 אבו גוש ירושלים 1.891242e+06 POLYGON Z ((209066.649 635655.2...
409 שדות נגב דרום 2.554627e+05 POLYGON Z ((162082.027 592043.1...

410 rows × 4 columns

Here is the towns layer, colored by the 'Machoz' administrative division:

towns.plot(column='Machoz', edgecolor='black', linewidth=0.2);
_images/275d64767b4b14c7ba92153597204cbc8ef57a16dd26aed95bf8563672d1d3e7.png

Railway lines#

Next, we read the railway line layer, from the Shapefile named 'RAIL_STRATEGIC.shp'. We subset the layer by attributes (see Filtering by attributes) to retain just the active railway segments. Then, we subset just the 'SEGMENT' (segment name) and 'geometry' columns:

rail = gpd.read_file('data/RAIL_STRATEGIC.shp')
rail = rail[rail['ISACTIVE'] == 'פעיל']
rail = rail[['SEGMENT', 'geometry']]
rail
SEGMENT geometry
0 כפר יהושע - נשר_1 LINESTRING (205530.083 741562.9...
1 באר יעקב-ראשונים_2 LINESTRING (181507.598 650706.1...
3 לב המפרץ מזרח - נשר_4 LINESTRING (203482.789 744181.5...
4 קרית גת - להבים_5 LINESTRING (178574.101 609392.9...
5 רמלה - רמלה מזרח_6 LINESTRING (189266.58 647211.50...
... ... ...
210 ויתקין - חדרה_215 LINESTRING (190758.23 704950.04...
211 בית יהושע - נתניה ספיר_216 LINESTRING (187526.597 687360.3...
214 השמונה - בית המכס_220 LINESTRING (200874.999 746209.3...
215 לב המפרץ - בית המכס_221 LINESTRING (203769.786 744358.6...
216 224_לוד מרכז - נתבג LINESTRING (190553.481 654170.3...

161 rows × 2 columns

Here is a plot of the rail layer, colored by segment name:

rail.plot(column='SEGMENT');
_images/7b6c3a594006b9918fca1bd19215133209961eeaa19b2848dd252562f67de1c9.png

Railway stations#

Finally, we read a layer of rainway stations, from a Shapefile named 'RAIL_STAT_ONOFF_MONTH.shp'. The attributes of interest which we retain (in addition to the 'geometry') are:

  • 'STAT_NAMEH'—Station name

  • 'OFF_7_DAY'—Passengers exiting the station at 7:00-8:00 during September 2020

stations = gpd.read_file('data/RAIL_STAT_ONOFF_MONTH.shp')
stations = stations[['STAT_NAMEH', 'OFF_7_DAY', 'geometry']]
stations
STAT_NAMEH OFF_7_DAY geometry
0 ראשון לציון משה דיין 72.0 POINT (177206.094 655059.936)
1 קרית ספיר נתניה 119.0 POINT (187592.123 687587.598)
2 ת"א השלום 1642.0 POINT (180621.94 664537.21)
3 שדרות 26.0 POINT (160722.849 602798.889)
4 רמלה 47.0 POINT (188508.91 648466.87)
... ... ... ...
58 מגדל העמק כפר ברוך 5.0 POINT (219851.931 728164.825)
59 באר שבע אוניברסיטה 130.0 POINT (181813.614 574550.411)
60 חיפה בת גלים 311.0 POINT (198610.01 748443.79)
61 לוד 238.0 POINT (188315.977 650289.373)
62 נהריה 64.0 POINT (209570.59 767769.22)

63 rows × 3 columns

Here is a visualization of the stations layer. Using symbology, we demonstrate that the largest number of passengers exiting a station in 7:00-8:00 are in Tel-Aviv, where lots of people travel to work:

stations.plot(column='OFF_7_DAY', cmap='Reds', legend=True);
_images/45ee3d5be9c8adc1ab98f0405469d378065f63b19908017b8f24b7c29615c2d8.png

Here is the plot of the towns, stations, and rail layers, together (see Plotting more than one layer):

base = towns.plot(color='white', edgecolor='lightgrey')
stations.plot(ax=base, color='red')
rail.plot(ax=base, color='blue');
_images/b56d7c795245339a81dc271b1de85a54e577b4afdc47c624967f3f7e38cc5766.png

CRS and reprojection#

What is a CRS?#

A Coordinate Reference System (CRS) defines the association between coordinates (which are ultimately just numbers) and specific locations on the surface of the earth. CRS can be divided into two groups:

  • Geographic CRS, where units are degrees (e.g., WGS84)

  • Projected CRS, where using are “real” distance units, typically meters (e.g., ITM, UTM)

There are several methods and formats to describe and specify a CRS. The shortest and simplest method is to use an EPSG code, which all standard CRS have. For example, Table 24 specifies the EPSG codes of CRS we encounter in this book.

Table 24 Coordinate Reference Systems (CRS) used in this book#

Name

Type

Units

EPSG code

WGS84

Geographic

Decimal degrees (\(^\circ\))

4326

ITM

Projected

Meters (\(m\))

2039

UTM Zone 36N

Projected

Meters (\(m\))

32636

Geographic CRS refer to the entire surface of the Earth, while projected CRS commonly refer to smaller areas. A fundamental difference between geographical and projected CRS is in their units. Geographic CRS refer to location on a sphere, using degree units, while projected CRS refer to locations on an approximated planar surface. One important implication of the unit difference is that planar gemetric calculations (such as distance and area) make sense in projected CRS, but not in geographic CRS.

Note

There are several formats, or standards, to specify a CRS, in Python and elsewhere. The most common ones are:

  • EPSG code—e.g., 4326

  • PROJ string—e.g., '+proj=longlat +datum=WGS84 +no_defs'

  • WKT string—e.g., 'GEOGCS["WGS 84",DATUM["WGS_1984",...[etc.]'

In this book, we exclusively use EPSG codes to define projections, for their simplicity.

Examining with .crs#

As mentioned previously (see GeoDataFrame structure), the CRS definition of a GeoSeries (or GeoDataFrame) is accessible through the .crs property. For example, here is the CRS definition of the rail layer:

rail.crs
<Derived Projected CRS: EPSG:2039>
Name: Israel 1993 / Israeli TM Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: Israel - onshore; Palestine Territory - onshore.
- bounds: (34.17, 29.45, 35.69, 33.28)
Coordinate Operation:
- name: Israeli TM
- method: Transverse Mercator
Datum: Israel 1993
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

The rail layer is in the ITM system. The EPSG code of this CRS is 2039. Exploring the printout can also reveal that the CRS units are meters, and that the geographical area where the CRS is supposed to be used is Israel.

Modifying with .set_crs#

To modify the CRS definition of a given layer, we can use the .set_crs method. In case the layer already has a CRS definition, we must use the allow_override=True option. Otherwise, we get an error. The allow_override argument is a safeguard against replacing the CRS definition of a layer when it already has one, which only makes sense if the existing CRS definition is wrong and we need to “fix” it (see below).

For example, the following expression replaces the CRS definition of rail (which is ITM, 2039) with another CRS (WGS84, 4326):

rail = rail.set_crs(4326, allow_override=True)

We can examine the .crs property, to see that the CRS definition has indeed changed:

rail.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

It is important to note that replacing the CRS definition has no effect on the coordinates. For example, the coordinates of rail are still in ITM units, rather than decimal degrees:

print(rail.geometry.iloc[11])
LINESTRING (188700.04908514675 648249.5428679045, 188521.65428514872 648456.553667903)

This means that the layer is now incorrect (!), since the coordinates do not match the CRS definition. Replacing the CRS definition should only be done when:

  • The layer has no CRS definition, e.g., it was accidentally lost, and we need to restore it

  • The existing CRS definition is incorrect, and we need to replace it

Let’s modify the CRS definition back to the original, to “fix” the layer we have just broken:

rail = rail.set_crs(2039, allow_override=True)

Note

To remove the CRS definition of a given layer altogether, you can assign None to the .crs property, as in rail.crs=None.

Reprojecting with .to_crs#

Reprojecting a vector layer involves not just replacing the CRS definition (see Modifying with .set_crs), but also re-calculating all of the layer coordinates, so that they are in agreement with the new CRS.

To reproject a GeoSeries or GeoDataFrame, we use the .to_crs method. For example, here is how we can transform the rail layer to WGS84:

rail = rail.to_crs(4326)

Comparing the updated .crs property of rail with the original one (see above), we can see that the CRS definition indeed changed:

rail.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Importantly, the coordinates also changed, which can be demonstrated by printing the WKT representation of any given geometry from the layer (compare with the above printout):

print(rail.geometry.iloc[11])
LINESTRING (34.87921676021756 31.926817568737928, 34.87732378978819 31.928679560071494)

Thus the information in the rail layer is still “correct”, as the coordinates match the CRS. It is just in a different CRS.

Finally, we can see that the axis text labels when plotting the layer have changed, now reflecting decimal degree values of the WGS84 CRS:

rail.plot();
_images/2a22cb37ed014753990037798da0a8d8ba6852acde4344f29ca339bc882550b6.png

Let us reproject rail back to the original CRS (i.e., ITM) before we proceed:

rail = rail.to_crs(epsg=2039)

Note

More information on specifying projections, and reprojecting layers, can be found in the Projections article.

Numeric calculations#

Overview#

In this section, we cover geometric commonly used calculations that result in numeric values, using geopandas. As we will see, the first three operations are very similar to their shapely counterparts (Table 25). The difference is that the calculation is applied on multiple geometries, rather than just one.

Table 25 Numeric geometry calculations in shapely and geopandas#

Method

shapely

geopandas

.length

see Length (shapely)

see Length (geopandas)

.area

see Area (shapely)

see Area (geopandas)

.distance

see Distance (shapely)

see Distance (geopandas)

The fourth operation, detecting nearest neighbors, which we cover last (see Nearest neighbors), is a little more complex and not really analogous to any shapely-based workflow.

Length (geopandas)#

The .length property returns a Series with line lengths per geometry in a GeoSeries or GeoDataFrame. For example, rail.length returns a Series of railway segment lengths:

rail.length
0      12379.715331
1       2274.111799
3       2793.337699
4       1960.170882
5       1195.701220
           ...     
210     5975.045997
211     1913.384027
214     1603.616014
215      166.180958
216     1284.983680
Length: 161, dtype: float64

Keep in mind that all numeric calculations, such as length, area (see Area (geopandas)), and distance (see Distance (geopandas)), are returned in the coordinate units, that is, in CRS units. In our case, the CRS is ITM, where the units are meters (\(m\)) (Table 24). The following expression transforms the values from \(m\) to \(km\):

rail.length / 1000
0      12.379715
1       2.274112
3       2.793338
4       1.960171
5       1.195701
         ...    
210     5.975046
211     1.913384
214     1.603616
215     0.166181
216     1.284984
Length: 161, dtype: float64

This Series can be assigned to into a new column, such as a column named 'length_km', as follows:

rail['length_km'] = rail.length / 1000
rail
SEGMENT geometry length_km
0 כפר יהושע - נשר_1 LINESTRING (205530.083 741562.9... 12.379715
1 באר יעקב-ראשונים_2 LINESTRING (181507.598 650706.1... 2.274112
3 לב המפרץ מזרח - נשר_4 LINESTRING (203482.789 744181.5... 2.793338
4 קרית גת - להבים_5 LINESTRING (178574.101 609392.9... 1.960171
5 רמלה - רמלה מזרח_6 LINESTRING (189266.58 647211.50... 1.195701
... ... ... ...
210 ויתקין - חדרה_215 LINESTRING (190758.23 704950.04... 5.975046
211 בית יהושע - נתניה ספיר_216 LINESTRING (187526.597 687360.3... 1.913384
214 השמונה - בית המכס_220 LINESTRING (200874.999 746209.3... 1.603616
215 לב המפרץ - בית המכס_221 LINESTRING (203769.786 744358.6... 0.166181
216 224_לוד מרכז - נתבג LINESTRING (190553.481 654170.3... 1.284984

161 rows × 3 columns

All geometric operations in geopandas assume planar geometry, returning the result in the CRS units. This means that layers in a geographic CRS (such as WGS84) need to be transformed to a projected CRS (see What is a CRS?) before any geometric calculation. Otherwise, numeric measurements, such as length, area, and distance, are going to be returned in decimal degree units, which usually does not make sense, as the relation between degrees and geographic distance is not a fixed one.

Exercise 09-a

  • Reproject the rail layer to geographic coordinates (4326), then calculate railway lengths once again. You should see the (meaningless) results in decimal degrees.

Area (geopandas)#

The .area property returns a Series with the area per geometry in a GeoSeries or GeoDataFrame. For example, towns.area returns the area sizes of each polygon in the towns layer, again in CRS units (\(m^2\)):

towns.area
0      2.001376e+05
1      1.597418e+09
2      6.177438e+06
3      9.850450e+05
4      6.104233e+04
           ...     
405    1.635240e+07
406    8.141962e+06
407    6.455340e+06
408    1.891242e+06
409    2.554627e+05
Length: 410, dtype: float64

If necessary, this series can be assigned to a new column. For example, the following expression divides the area values by \(1000^2\), i.e., 1000**2 to convert from \(m^2\) to \(km^2\), then assigns the results to a new column named 'area_km2':

towns['area_km2'] = towns.area / 1000**2
towns
Muni_Heb Machoz Shape_Area geometry area_km2
0 ללא שיפוט - אזור מעיליא צפון 2.001376e+05 POLYGON Z ((225369.655 770523.6... 0.200138
1 תמר דרום 1.597418e+09 POLYGON Z ((206899.135 552967.8... 1597.417603
2 שהם מרכז 6.177438e+06 POLYGON Z ((194329.25 655299.13... 6.177438
3 שהם מרכז 9.850450e+05 POLYGON Z ((196236.573 657835.0... 0.985045
4 ראש פינה צפון 6.104233e+04 POLYGON Z ((255150.135 764764.6... 0.061042
... ... ... ... ... ...
405 אופקים דרום 1.635240e+07 POLYGON Z ((164806.146 577898.8... 16.352399
406 אבן יהודה מרכז 8.141962e+06 POLYGON Z ((189803.359 684152.9... 8.141962
407 אבו סנאן צפון 6.455340e+06 POLYGON Z ((212294.953 763168.8... 6.455340
408 אבו גוש ירושלים 1.891242e+06 POLYGON Z ((209066.649 635655.2... 1.891242
409 שדות נגב דרום 2.554627e+05 POLYGON Z ((162082.027 592043.1... 0.255463

410 rows × 5 columns

Exercise 09-b

  • Find out the names of the largest and the smallest municipal area polygon, in terms of area size, based on the towns layer. (answer: 'ללא שיפוט - אזור נחל ארבל' and 'רמת נגב')

Exercise 09-c

  • Check the association between the 'area_km2' column we calculated, and the 'Shape_Area' column which is stored in the Shapefile, using a scatterplot (see Scatterplots (pandas)) (Fig. 56).

_images/8f7e93fede451defcbc599fd57410cceca956394a1c83729415c9ec011b308ac.png

Fig. 56 Solution of exercise-09-c: Association between the (calculated) 'area_km2' column and the (original) 'Shape_Area' column#

Distance (geopandas)#

The .distance method can be used to calculate distances. Each distance is calculated exactly the same way as in shapely, i.e., minimal distance between the geometries assuming planar coordinates. However, since we are dealing with multiple inputs, the geopandas mode of operation is different. Specifically, the geopandas operation of .distance (and other pairwise methods, see below) can work in two “modes”:

  • One-to-many— when the “second” argument is a shapely geometry (Fig. 57, left)

  • Pairwise—when the “second” argument is a GeoSeries (Fig. 57, right)

_images/gpd-one_to_many_pairsise.svg

Fig. 57 geopandas modes of operation for methods accepting pairs of inputs (.intersection, etc.): many-to-one (left) and pairwise (right)#

The following example demonstrates the latter. Here, we are calculating the distances between:

  • stations—all railway stations, and

  • rail.geometry.iloc[0]—the first railway segment.

The result is a Series of distances, in \(m\):

stations.distance(rail.geometry.iloc[0])
0      84284.975205
1      50424.920267
2      74235.170679
3     138847.580690
4      86603.473121
          ...      
58      8595.362486
59    160138.041969
60      9758.724335
61     84904.898970
62     26515.877484
Length: 63, dtype: float64

What if we need to calculate the pairwise distances, between each station and each railway segment? This can be done using .apply (see apply and custom functions). The following expression, in plain language, means that we take each row (axis=1) in stations as x, then calculate the Series of distances from all railway segments to the geometry of that row (rail.distance(x.geometry)).

The result is a DataFrame representing a distance matrix, where:

  • Rows represent railway stations

  • Columns represent railway segments

  • Cell values are distances, in meters

d = stations.apply(lambda x: rail.distance(x.geometry), axis=1)
d
0 1 3 4 5 ... 210 211 214 215 216
0 84284.975205 6120.325293 91266.392666 43769.104489 13360.106887 ... 46002.039665 31998.139305 94172.402848 93091.940077 13376.996382
1 50424.920267 37379.975682 57120.870160 76795.773036 39353.653794 ... 11924.365438 236.552504 60107.821537 58958.887738 33548.162167
2 74235.170679 13859.386195 81199.631964 53259.279461 18180.867522 ... 35958.589113 21932.455984 84145.901064 83037.130172 14356.406044
3 138847.580690 51367.336965 146064.901194 19030.199927 52794.173376 ... 100790.724935 86796.547920 148925.381315 147885.212962 58403.590869
4 86603.473121 5697.560021 94960.212976 38438.916853 289.422152 ... 50857.456340 37074.116175 98521.682411 97044.665341 4964.211497
... ... ... ... ... ... ... ... ... ... ... ...
58 8595.362486 86339.689331 20244.761995 123902.486603 85772.327241 ... 37220.552915 52057.048736 26186.457201 22822.683088 79583.761529
59 160138.041969 74611.504901 169029.718846 34992.788653 73042.322269 ... 125012.376699 111083.909066 172714.053593 171172.498821 78870.029306
60 9758.724335 99222.671750 6473.850217 138574.693149 100683.138956 ... 44196.792329 62080.870950 1709.845960 6471.883040 94617.022408
61 84904.898970 5579.557103 93200.557041 40156.246177 2075.672842 ... 49040.957113 35245.583872 96738.721423 95275.546196 3553.733949
62 26515.877484 120379.789258 24360.621700 159481.427917 121328.202345 ... 65575.558369 83375.846025 22375.584322 24118.556926 115179.615344

63 rows × 161 columns

Note that the first column in the above result is identical to the Series we got in the previous expression.

The distance matrix can be refined to calculate more specific properties related to distance that we are interested in. For example, to get the minimum distance to any of the railway segments (rather than all distances), we can apply the .min() method on each row, as follows. The result is a Series of minimal distances per station:

nearest_seg_dist = d.min(axis=1)
nearest_seg_dist
0      13.440327
1       4.932094
2      16.180832
3      64.381044
4       2.941401
         ...    
58     56.102377
59      7.540327
60     44.130049
61     22.556829
62    115.502833
Length: 63, dtype: float64

We can assign it to a new column named 'nearest_seg_dist' (“distance to nearest segment”) in stations, as follows:

stations['nearest_seg_dist'] = nearest_seg_dist
stations
STAT_NAMEH OFF_7_DAY geometry nearest_seg_dist
0 ראשון לציון משה דיין 72.0 POINT (177206.094 655059.936) 13.440327
1 קרית ספיר נתניה 119.0 POINT (187592.123 687587.598) 4.932094
2 ת"א השלום 1642.0 POINT (180621.94 664537.21) 16.180832
3 שדרות 26.0 POINT (160722.849 602798.889) 64.381044
4 רמלה 47.0 POINT (188508.91 648466.87) 2.941401
... ... ... ... ...
58 מגדל העמק כפר ברוך 5.0 POINT (219851.931 728164.825) 56.102377
59 באר שבע אוניברסיטה 130.0 POINT (181813.614 574550.411) 7.540327
60 חיפה בת גלים 311.0 POINT (198610.01 748443.79) 44.130049
61 לוד 238.0 POINT (188315.977 650289.373) 22.556829
62 נהריה 64.0 POINT (209570.59 767769.22) 115.502833

63 rows × 4 columns

Nearest neighbors#

Here is another technique to extract information from the distance matrix, expanding the above expression of distance to nearest neighbor. This time, we calculate the name ('SEGMENT' column) of the nearest railway segment for each station. We are using the .idxmin method (see Table 15), instead of .min, to get the index of the nearest segment (instead of the distance to it):

ids = d.idxmin(axis=1)
ids
0      22
1     100
2      57
3     131
4      87
     ... 
58    152
59    205
60    153
61     77
62     12
Length: 63, dtype: int64

The index can be used to subset segment names (rail['SEGMENT']). The result is a series of the nearest segment names:

nearest_seg_name = rail['SEGMENT'].loc[ids]
nearest_seg_name
22              משה דיין-קוממיות_23
100    נתניה מכללה - נתניה ספיר_101
57           סבידור מרכז - השלום_58
131              שדרות-יד מרדכי_134
87                    לוד - רמלה_88
                   ...             
152            עפולה - כפר ברוך_155
205    באר שבע צפון - באר שבע אוניב
153          חוף כרמל - בת גלים_156
77                    לוד - רמלה_78
12                   נהריה - עכו_13
Name: SEGMENT, Length: 63, dtype: object

Let us attach the result to another column in the stations table, named nearest_seg_name (“nearest segment name”). We need to reset the indices to make sure the column is attached by position:

nearest_seg_name = nearest_seg_name.reset_index(drop=True)
nearest_seg_name
0              משה דיין-קוממיות_23
1     נתניה מכללה - נתניה ספיר_101
2           סבידור מרכז - השלום_58
3               שדרות-יד מרדכי_134
4                    לוד - רמלה_88
                  ...             
58            עפולה - כפר ברוך_155
59    באר שבע צפון - באר שבע אוניב
60          חוף כרמל - בת גלים_156
61                   לוד - רמלה_78
62                  נהריה - עכו_13
Name: SEGMENT, Length: 63, dtype: object
stations['nearest_seg_name'] = nearest_seg_name
stations
STAT_NAMEH OFF_7_DAY geometry nearest_seg_dist nearest_seg_name
0 ראשון לציון משה דיין 72.0 POINT (177206.094 655059.936) 13.440327 משה דיין-קוממיות_23
1 קרית ספיר נתניה 119.0 POINT (187592.123 687587.598) 4.932094 נתניה מכללה - נתניה ספיר_101
2 ת"א השלום 1642.0 POINT (180621.94 664537.21) 16.180832 סבידור מרכז - השלום_58
3 שדרות 26.0 POINT (160722.849 602798.889) 64.381044 שדרות-יד מרדכי_134
4 רמלה 47.0 POINT (188508.91 648466.87) 2.941401 לוד - רמלה_88
... ... ... ... ... ...
58 מגדל העמק כפר ברוך 5.0 POINT (219851.931 728164.825) 56.102377 עפולה - כפר ברוך_155
59 באר שבע אוניברסיטה 130.0 POINT (181813.614 574550.411) 7.540327 באר שבע צפון - באר שבע אוניב
60 חיפה בת גלים 311.0 POINT (198610.01 748443.79) 44.130049 חוף כרמל - בת גלים_156
61 לוד 238.0 POINT (188315.977 650289.373) 22.556829 לוד - רמלה_78
62 נהריה 64.0 POINT (209570.59 767769.22) 115.502833 נהריה - עכו_13

63 rows × 5 columns

The last example can be considered as a “manually” constructed nearest-neighbor spatial join, since we actually joined an attribute ('SEGMENT') value from one layer (rail) to another layer (stations), based on minimum distance. Towards the end of this chapter we will introduce the the gpd.sjoin_nearest function which can be used for nearest-neighbour spatial join using much shorter code (see Nearest neighbor join).

New geometries 1 (geopandas)#

Overview#

In this section and the next one, we learn about functions that calculate new geometries. We are already familiar with the this type of operations for individual geometry objects, using shapely (see New geometries 1 (shapely) and New geometries 2 (shapely)). Now, we learn about the same type of operations for vector layers, which include more than one geometry. In agreement with shapely method, we split the discussion into two parts: operations on single geometries (this section), and operations on pairs of geometries (New geometries 2 (geopandas)).

The single-geometry methods are similar to the shapely methods (Table 26), only that they accept a GeoSeries (or GeoDataFrame) and return a new GeoSeries with the results. In fact, the geopandas methods are simply “wrappers” of the respective shapely methods. We will demonstrate just the two most commonly used methods, .centroid (Centroids (geopandas)) and .buffer (Buffers (geopandas)). For example, when applying .centroid on a vector layer or geometry column, we get a new geometry column, with the same number of geometries as the original, where each geometry is the centroid of the respective original geometry.

Table 26 geopandas properties and methods which create a new geometry from an input GeoSeries or GeoDataFrame#

Property or Method

Meaning

.centroid

Centroids (geopandas)

.buffer()

Buffers (geopandas)

.convex_hull

Convex Hull

.envelope

Envelope

.minimum_rotated_rectangle

Minimum rotated rectangle

.simplify()

Simplified geometry

Centroids (geopandas)#

The .centroid method, applied on a GeoSeries or GeoDataFrame, returns a new GeoSeries with the centroids. For example, here is how we can create a GeoSeries of town polygon centroids:

ctr = towns.centroid
ctr
0      POINT (225182.898 770809.745)
1      POINT (228716.918 561880.946)
2       POINT (195454.34 655942.058)
3       POINT (196351.843 658288.23)
4      POINT (254877.752 764800.811)
                   ...              
405    POINT (165198.012 579543.277)
406    POINT (189227.026 686069.086)
407    POINT (215379.077 763146.384)
408    POINT (210369.545 634906.185)
409    POINT (161682.995 591763.635)
Length: 410, dtype: geometry

And here is a plot of the result:

ctr.plot(markersize=15, edgecolor='black');
_images/050490331a36de21f773bbe2837fff0b28246ad1863e56a873d2ece5275de63e.png

Buffers (geopandas)#

The .buffer method buffers each geometry in a given GeoDataFrame or GeoSeries to the specified distance. It returns a GeoSeries with the resulting polygons. For example, here we create buffers of 5,000 \(m\) (i.e., 5 \(km\)) around each railway station:

stations_buffer = stations.buffer(5000)
stations_buffer
0     POLYGON ((182206.094 655059.936...
1     POLYGON ((192592.123 687587.598...
2     POLYGON ((185621.94 664537.21, ...
3     POLYGON ((165722.849 602798.889...
4     POLYGON ((193508.91 648466.87, ...
                     ...                
58    POLYGON ((224851.931 728164.825...
59    POLYGON ((186813.614 574550.411...
60    POLYGON ((203610.01 748443.79, ...
61    POLYGON ((193315.977 650289.373...
62    POLYGON ((214570.59 767769.22, ...
Length: 63, dtype: geometry

Here is a plot of the result:

stations_buffer.plot(color='none');
_images/2b862c7d53ebbc875ab291b138a89cb0ceaedf9c5c0440315e065e8405e61914.png

Exercise 09-d

  • As shown above, centroids and buffers are returned as a GeoSeries.

  • Suppose that, instead, you need a GeoDataFrame with the original attributes of the source layer (e.g., the railway station properties) and the centroid, or buffer, geometries. How can we create such GeoDataFrames?

New geometries 2 (geopandas)#

Pairwise methods#

geopandas defines analogous pairwise methods (Table 27), with the same functionality as those we are familiar from shapely (Table 23).

Table 27 geopandas pairwise methods which create a new GeoSeries, given a GeoSeries and a shapely geometry (many-to-one) or two GeoSeries (pairwise)#

Method

Meaning

.intersection

Intersection

.union

Union

.difference

Difference

.symmetric_difference

Symmetric difference

The methods listed in Table 27 can also be used in the two modes mentioned for .distance (Distance (geopandas)), namely one-to-many or pairwise (Fig. 57). In the following example we demonstrate the first mode (many-to-one), which is both simpler and more useful. For the first (“many”) input we’ll take towns, while for the second (“one”) geometry, we’ll take a particular railway segment, part of the line that goes from Tel-Aviv to Jerusalem:

seg = rail[rail['SEGMENT'] == 'נתבג - יצחק נבון_198']
seg
SEGMENT geometry length_km
194 נתבג - יצחק נבון_198 LINESTRING (194999.696 643851.4... 27.985833

Given towns and seg, we will calculate pairwise intersections, that is, the segment portions that go through each town.

An interactive map the inputs, seg and the towns features it intersects with, is shown below. Note that we are using .intersects to filter towns, which will be explained later (Boolean methods (geopandas)). Also note the technique being used to show two different layers in the same .explore plot (similar to what we learned about .plot, see Plotting more than one layer).

m = towns[towns.intersects(seg.unary_union)].explore()
seg.explore(m = m, color='red', style_kwds={'weight':8, 'opacity':0.8})
/tmp/ipykernel_1314487/2153663608.py:1: FutureWarning: The 'unary_union' attribute is deprecated, use the 'union_all()' method instead.
  m = towns[towns.intersects(seg.unary_union)].explore()
Make this Notebook Trusted to load map: File -> Trust Notebook