2  Vector layers

2.1 Loading packages

First, we import the two packages we will be working with:

  • pandas—for working with tables, and
  • geopandas—For working with vector layers,

as follows:

import pandas as pd
import geopandas as gpd

2.2 Table to point layer

In the first example, we are going to read an Excel file with x/y coordinates, and convert it to a (point) vector layer. The file bycode2019-2.xlsx contains town locations in Israel in 2019, downloaded from data.gov.il.

To read an Excel (.xlsx) file, we use the pd.read_excel function. The result is a data structure called DataFrame, used to represent a table in the pandas package:

dat = pd.read_excel('data/bycode2019-2.xlsx')
dat
שם יישוב סמל יישוב תעתיק ... שנה שם יישוב באנגלית אשכול רשויות מקומיות
0 אבו ג'ווייעד (שבט) 967 ABU JUWEI'ID ... 2019 Abu Juway'ad NaN
1 אבו גוש 472 ABU GHOSH ... 2019 Abu Ghosh NaN
2 אבו סנאן 473 ABU SINAN ... 2019 Abu Sinan NaN
... ... ... ... ... ... ... ...
1480 תראבין א-צאנע (שבט) 970 TARABIN AS-SANI ... 2019 Tarabin as-Sani' NaN
1481 תרבין א-צאנע (יישוב)* 1346 TARABIN AS-SANI ... 2019 Tarabin As-Sani 610.0
1482 תרום 778 TARUM ... 2019 Tarum NaN

1483 rows × 23 columns

The column named “coordinates” contains the town coordinates:

dat['קואורדינטות']
0       2.040057e+09
1       2.105263e+09
2       2.160776e+09
            ...     
1480    1.830056e+09
1481    1.752658e+09
1482    1.983663e+09
Name: קואורדינטות, Length: 1483, dtype: float64

The following code section:

  • filters out the rows with missing coordinate values, since these cannot be tranalated to point geometries, and
  • splits the coordinate string to X and Y values, according to the instructions.
dat = dat[dat['קואורדינטות'].notna()].copy().reset_index(drop=True)
dat['x'] = dat['קואורדינטות'].astype(str).str.slice(0,5).astype(float) * 10
dat['y'] = dat['קואורדינטות'].astype(str).str.slice(5,10).astype(float) * 10

As a result, we now have x and y numeric columns with the ITM coordinates of each town:

dat
שם יישוב סמל יישוב תעתיק ... אשכול רשויות מקומיות x y
0 אבו ג'ווייעד (שבט) 967 ABU JUWEI'ID ... NaN 204000.0 571000.0
1 אבו גוש 472 ABU GHOSH ... NaN 210520.0 634810.0
2 אבו סנאן 473 ABU SINAN ... NaN 216070.0 762840.0
... ... ... ... ... ... ... ...
1448 תראבין א-צאנע (שבט) 970 TARABIN AS-SANI ... NaN 183000.0 564000.0
1449 תרבין א-צאנע (יישוב)* 1346 TARABIN AS-SANI ... 610.0 175260.0 583690.0
1450 תרום 778 TARUM ... NaN 198360.0 632270.0

1451 rows × 25 columns

The two columns, along with the CRS definition (EPSG:2039 for ITM) can be transformed to a GeoSeries object. A GeoSeries is a sequence of geometries, along with the CRS definition, also known as the “geometry column” when it is part of a vector layer:

geom = gpd.points_from_xy(dat['x'], dat['y'], crs=2039)
geom = gpd.GeoSeries(geom)
geom
0       POINT (204000.000 571000.000)
1       POINT (210520.000 634810.000)
2       POINT (216070.000 762840.000)
                    ...              
1448    POINT (183000.000 564000.000)
1449    POINT (175260.000 583690.000)
1450    POINT (198360.000 632270.000)
Length: 1451, dtype: geometry

One of the first things we might want to do with a GeoSeries is to .plot it, to examine what it looks like:

geom.plot();

Combining a GeoSeries with a corresponding table yields GeoDataFrame object, a data structure representing a vector layer, where:

  • The GeoSeries is the geometric/spatial part
  • The DataFrame is the non-spatial/attributes part

For example, here we combine the GeoSeries named geom, with a DataFrame named dat, resulting in a GeoDataFrame named pnt:

pnt = gpd.GeoDataFrame(data=dat, geometry=geom)
pnt
שם יישוב סמל יישוב תעתיק ... x y geometry
0 אבו ג'ווייעד (שבט) 967 ABU JUWEI'ID ... 204000.0 571000.0 POINT (204000.000 571000.000)
1 אבו גוש 472 ABU GHOSH ... 210520.0 634810.0 POINT (210520.000 634810.000)
2 אבו סנאן 473 ABU SINAN ... 216070.0 762840.0 POINT (216070.000 762840.000)
... ... ... ... ... ... ... ...
1448 תראבין א-צאנע (שבט) 970 TARABIN AS-SANI ... 183000.0 564000.0 POINT (183000.000 564000.000)
1449 תרבין א-צאנע (יישוב)* 1346 TARABIN AS-SANI ... 175260.0 583690.0 POINT (175260.000 583690.000)
1450 תרום 778 TARUM ... 198360.0 632270.0 POINT (198360.000 632270.000)

1451 rows × 26 columns

Figure 2.1 summarizes the workflow of constructing a vector layer:

  • Individual geometries—shapely objects (package shapely)
  • Geometry column—GeoSeries objects
  • Vector layer—GeoDataFrame object

Figure 2.1: Creating a GeoDataFrame from scratch

Figure 2.2 shows the hierarchical structure of GeoDataFrame, using another hypothetical vector layer with two points (London and Paris):

  • The entire vector layer is a GeoDataFrame object
  • The geometric part, i.e., the geometry column, is a GeoSeries object
  • Each cell in the geometry column, i.e., an individual geometry, is a shapely object (package shapely)

Figure 2.2: Structure of a GeoDataFrame

2.3 Reading from file

Another common workflow with vector layers is to import an existing one, from a file. In the next example, we import a Shapefile named TAZ_NORTH_POPDENS_2016.shp. This is a polygonal layer with population density (POP_DENSE) extimates for Northern Israel. The Shapefile was downloaded from data.gov.il.

To import a vector layer from a file, we use the gpd.read_file function:

pol = gpd.read_file('data/TAZ_NORTH_POPDENS_2016.shp')
pol
OBJECTID POP_DENSE SHAPE_Leng ... POP2016 TAZ_AREA geometry
0 1 1425.666870 13215.852473 ... 9852 6.910450e+06 POLYGON ((217434.341 73558...
1 2 154.888062 21350.948435 ... 2748 1.774185e+07 POLYGON ((215109.748 73483...
2 3 115.392609 20113.127398 ... 2064 1.788676e+07 POLYGON ((212411.866 74531...
... ... ... ... ... ... ... ...
778 779 10459.399414 4026.527360 ... 6828 6.528099e+05 POLYGON ((227634.472 72406...
779 780 1758.489014 3316.818207 ... 1017 5.783374e+05 POLYGON ((226951.788 72387...
780 781 518.559326 20705.740930 ... 5568 1.073744e+07 POLYGON ((226132.518 72486...

781 rows × 7 columns

We can see that the geometry type, in this case, is POLYGON.

2.4 Plotting

By default, plotting a GeoDataFrame using .plot shows an image of the geometric part:

pol.plot();

We can add symbology and legend, according to an attribute, using the column and legend=True arguments. For example, here we display the population density estimates (POP_DENSE attributes):

pol.plot(column='POP_DENSE', legend=True);

To modify polygon outline style and the color palette, we can use the following additional arguments:

  • edgecolor='black'—Black lines
  • linewidth=0.1—Reduced line width
  • cmap='Reds—The Reds color palette (from colorbrewer)
pol.plot(
    column='POP_DENSE', 
    legend=True,
    edgecolor='black', 
    linewidth=0.1, 
    cmap='Reds'
);

To display more than one layer in the same plot, we need to:

  • store the first plot in a variable (e.g., base), and
  • pass it as the ax argument of any subsequent plot(s) (e.g., ax=base).

For example:

base = pol.plot(color='red', edgecolor='none')
pnt.plot(ax=base, markersize=0.5, color='black');

Another useful method to interactively examine the layer(s) is .explore. This creates an interactive map, with a background basemap for context. The parameters of .explore are mostly similar to .plot, with some differences (such as style_kwds for style settings):

pol.explore(
    column='POP_DENSE', 
    legend=True, 
    cmap='Reds',
    style_kwds={'color': 'black', 'weight': 1, 'fillOpacity': 0.5}
)
Make this Notebook Trusted to load map: File -> Trust Notebook

For more information about mapping with geopandas, see the Mapping and Plotting Tools tutorial.

2.5 Geoprocessing

geopandas provides the standard geoprocessing operators, using shapely and pyproj under the hood (which in turn are interfaces to the GEOS and PROJ software, respectively). For example:

  • CRS and reprojection—Transforming a given layer from one CRS to another
  • Numeric calculations—Calculating numeric geometry properties, such as length, area, and distance
  • New geometries—Creating new geometries, such as calculating buffers, or area of intersection
  • Geometric relations—Evaluating the relation between layers, such as whether two geometries intersect
  • Spatial join—Joining attributes from one layer to another, based on spatial relations

For example, the .area property gives the area sizes of the geometries in CRS units (\(m^2\)):

pol.area
0      6.910450e+06
1      1.774185e+07
2      1.788676e+07
           ...     
778    6.528099e+05
779    5.783374e+05
780    1.073744e+07
Length: 781, dtype: float64

As a more complicated example, let us calculate the distances between each polygon in pol and one point in pnt (Haifa). First, we subset the pnt feature (Haifa) we are interested in, through “selection by attributes”. The GeoDataFrame named haifa has one feature:

haifa = pnt[pnt['שם יישוב'] == 'חיפה']
haifa
שם יישוב סמל יישוב תעתיק ... x y geometry
544 חיפה 4000 HAIFA ... 201120.0 745440.0 POINT (201120.000 745440.000)

1 rows × 26 columns

From that feature, we can extract the first (and only) “geometry”. This is a shapely object, which by default is plotted:

haifa['geometry'].iloc[0]

Now that we have:

  • a vector layer (pol), and
  • an individual geometry (haifa['geometry'].iloc[0]),

we can use the .distance method to calculate the distances from all polygons to the point. The result is a numeric Series, which we can immediately “insert” as a new attribute named "dist" in pol:

pol['dist'] = pol.distance(haifa['geometry'].iloc[0])
pol
OBJECTID POP_DENSE SHAPE_Leng ... TAZ_AREA geometry dist
0 1 1425.666870 13215.852473 ... 6.910450e+06 POLYGON ((217434.341 73558... 17556.437003
1 2 154.888062 21350.948435 ... 1.774185e+07 POLYGON ((215109.748 73483... 14726.885602
2 3 115.392609 20113.127398 ... 1.788676e+07 POLYGON ((212411.866 74531... 6678.954051
... ... ... ... ... ... ... ...
778 779 10459.399414 4026.527360 ... 6.528099e+05 POLYGON ((227634.472 72406... 33459.684072
779 780 1758.489014 3316.818207 ... 5.783374e+05 POLYGON ((226951.788 72387... 33493.912906
780 781 518.559326 20705.740930 ... 1.073744e+07 POLYGON ((226132.518 72486... 31011.148069

781 rows × 8 columns

Plotting the resulting "dist" attribute shows the distribution of distances, which are between \(0\) and \(80,000\) \(m\) (i.e., \(80\) \(km\)):

base = pol.plot(column='dist', legend=True, cmap='Spectral')
haifa.plot(ax=base, color='blue', edgecolor='black', markersize=80);

2.6 Writing to file

To export a GeoDataFrame to a vector layer file, we use the .to_file method. The file format is automatically determined according to the chosen file extension, such as:

  • .shp—Shapefile
  • .gpkg—GeoPackage
  • .geojson—GeoJSON

For example, here is how we can export the pol layer to a GeoPackage file named pol.gpkg:

pol.to_file('pol.gpkg')

Figure 2.3 shows the expored file when opened in QGIS.

Figure 2.3: Exported GeoPackage (.gpkg) file viewed in QGIS

2.7 More information

See the Introduction to GeoPandas tutorial (Figure 2.4), and other sections in the geopandas documentation, for more information about geopandas.

Figure 2.4: The Introduction to GeoPandas tutorial