Tutorial: Mapping parcel lockers with OpenStreetMap and DuckDB (Part 1)
Parcel lockers, which allow users to pick up (and often drop-off) packages at an automated kiosk, have grown in popularity in recent years. They offer benefits for both customers and delivery companies:
- For customers, parcel lockers can be a convenient way to securely receive packages when they won't be home when a courier makes deliveries.
- For delivery companies, parcel lockers reduce costs by lowering the number of stops couriers need to make and reducing costs associated with undeliverable packages.
We recently posted an interactive Honeycomb Map of all of the parcel lockers in Berlin, Germany. Here is an embedded version of the map, which uses Honeycomb Data Explorer to allow you to filter the data by operator:
This tutorial (Part 1) will guide you through using OpenStreetMap and DuckDB to find and organize data about parcel lockers in Berlin. While the tutorial is focused on parcel lockers in Berlin, the techniques and data sources described can be applied to many types of infrastructure worldwide.
Part 2 (coming soon!) will show you how to use Honeycomb to build an interactive map of this data.
Prologue: Explore Overture and OpenStreetMap as potential data sources
When starting any data project, the initial step is often exploring what data is available. In this section, I describe how I investigated two open data sources - Overture Maps and OpenStreetMap. If you are just interested in how to fetch the parcel locker data, you can skip ahead to the next section.
For this project I was interested in public sources of parcel locker locations that had licensing which allowed commercial reuse. Even though parcel locker locations are by-definition public, none of the large locker operators in Berlin (DHL Packstation and Amazon) publish lists of their locker locations. Rather than getting first-party data directly from operators, I decided to explore public data sources.
Overture Maps
The first data source I explored was the Overture Maps Foundation. Overture Maps is a relatively new collaborative map data project with data contributed by partners including Meta, Amazon, and Microsoft.
Overture’s data consists of five different themes:
- Base land and water
- Administrative boundaries
- Buildings
- Places
- Transportation.
I was hopeful parcel lockers would be part of the places
theme, which includes POIs like gas stations, grocery stores, and restaurants.
Overture does not provide an easy-to-use interface for exploring their data. Instead, they publish very large files to object storage along with documentation to read them. The March 2024 release is located at s3://overturemaps-us-west-2/release/2024-03-12-alpha.0
and organized in a series of parquet files.
Rather than downloading these files directly, I chose to use DuckDB running on my Macbook to query the data. DuckDB runs locally on my computer, parsing queries and then fetching only the data that is required from the remote source. This means that, although the remote files are many gigabytes large, DuckDB will only read a small part of them.
First, start the command-line (CLI) version of DuckDB:
# start duckdb from the macos terminal
duckdb
Note: The command to start the DuckDB CLI (command line interface) might be slightly different on your computer. Take a look at the DuckDB documentation for more complete instructions. However, once you get the DuckDB CLI started, the SQL commands used in the rest of the tutorial are universal.
With DuckDB running, run the following commands to set up the DuckDB instance to read the remote Overture data files:
-- set up duckdb to query overture data
INSTALL spatial;
LOAD spatial;
LOAD httpfs;
set s3_region = 'us-west-2';
Next, I was interested in finding whether parcel delivery lockers were contained inside the Overture places data. Unfortunately, providing a list of categories available is listed as a 'TODO' in the official documentation.
However, since you have access to the data, you can find a list of available categories by running a query to select distinct category names:
-- sql query to select distinct category names and save them to a csv file
COPY (
SELECT DISTINCT categories.main
FROM read_parquet(
's3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=places/type=*/*',
filename = true,
hive_partitioning = 1
)
) to 'categories.csv' WITH (FORMAT CSV, DELIMITER ',');
This query should take a minute or two to complete and will result in a CSV file which is saved to the folder you are currently in. You can then open this file in any spreadsheet software (Excel, Numbers, Google Sheets) and look for a potential parcel locker category.
Unfortunately, there is no parcel_locker
category. When I ran this, the closest I could find was ‘post_office’ (a parcel locker could be considered a type of private post office, right?).
If you want to take a closer look at the locations in the ‘post_office’ category, you can run a query for all of the places with a main category of post_office
:
-- sql query to select all the overture places which have a category of 'post_office'
-- the results will include separate latitude and longitude columns
-- the results will be saved to a csv file
COPY (
SELECT type,
*,
ST_X(ST_GeomFromWkb(geometry)) AS longitude,
ST_Y(ST_GeomFromWkb(geometry)) AS latitude
FROM read_parquet(
's3://overturemaps-us-west-2/release/2024-03-12-alpha.0/theme=places/type=*/*',
filename = true,
hive_partitioning = 1
)
WHERE categories.main = 'post_office'
) TO 'post_offices.csv' WITH (FORMAT CSV);
This query will probably take a few minutes or more to run, depending on the speed of your internet connection. After it finishes, you can check how many rows are in the CSV file which was just created:
SELECT COUNT(*) FROM 'post_offices.csv';
107k post offices around the world - pretty cool!
At this point, I dropped this data into Honeycomb to do a visual check of whether this data included some parcel lockers near my apartment, or whether it’s just ‘classic’ post offices. Unfortunately, after looking at the data, it’s just regular post offices (at least in the US and Germany). So it looks like we will not be able to get the data needed for this project from Overture.
Open Street Map
The second data source I investigated for parcel locker data was OpenStreetMap (OSM). OpenStreetMap is a community-maintained worldwide map.
I have used OpenStreetMap data in previous jobs, and I found that data quality depends on the number of contributors in an area. In dense urban areas with a tradition of support for open-source software (like Berlin, Germany) the quality is often phenomenal, with individual trees mapped along with their specific species. However, in other places with fewer community contributors the data can be more limited.
To find whether the location of parcel lockers was available from OpenStreetMap, I started by trying to find a locker near my apartment. I went to openstreetmap.org, typed in my address, and then clicked on ‘Layers’ > ‘Map Data’ so that I could see every individual object on the map.
You can follow a similar process to try and identify whether places of interest near you are present in OpenStreetMap.
In this case, OSM has the DHL packstation near me. And it looks like parcel lockers have their own amenity tag in OSM, parcel_locker
. It looks like OpenStreetMap has the data we need to make a map of all parcel lockers in Berlin.
Step 1: Download data from OpenStreetMap
We know that OpenStreetMap (OSM) has data about parcel lockers. The next step is to search for all parcel lockers in Berlin within the OSM data.
OSM has its own API for querying data, called the Overpass API. The Overpass API has its own query language, called OverpassQL.
Personally I find the syntax for OverpassQL to be a bit complicated, however, there is a great online tool called Overpass Turbo that can generate an OverpassQL query for you. Overpass Turbo will then run the query against publicly available Overpass servers and display the results on a map.
Here is a OverpassQL query that Overpass Turbo generated for me:
/*
This has been generated by the overpass-turbo wizard.
The original search was:
“amenity=parcel_locker”
*/
[out:json][timeout:25];
// gather results
nwr["amenity"="parcel_locker"]({{bbox}});
// print results
out geom;
Copy and paste this query into Overpass Turbo and click 'Run'.
After running this query with the map centered on Berlin, I got the following map of 749 parcel lockers in Berlin:
This looks like the exact data we are looking for!
You can easily download the Overpass Turbo results as a file by clicking ‘Export’ > ‘Data’ > ‘GeoJSON’ > ‘Download’.
You should now have a GeoJSON file containing the locations of 749 parcel lockers in the Berlin (and Brandenburg) area.
Step 2: Use DuckDB to convert GeoJSON data to table data for analysis
GeoJSON, being an extension of JSON, is a hierarchical data structure which is built on sets of key-value (name-value) pairs. In our parcel locker GeoJSON file, each parcel locker is represented by a GeoJSON ‘Feature’, which has both geometry data and a set of properties such as ‘Operator’ and ‘Name’.
GeoJSON is a popular format for exchanging geospatial data in web applications, however, it is not a common format for data analysis. GeoJSON can’t be opened in a tool like Excel or Google Sheets because it is not a tabular format - each feature may have a different number of properties, or even nested properties.
However, for this project we want our data in a tabular format so we can do basic analysis like counting the number of lockers that each operator maintains. To do this we will use DuckDB along with the DuckDB Spatial Extension.
First, start the CLI version of DuckDB:
# start duckdb from the macos terminal
duckdb
With the DuckDB CLI running, install and load the spatial extension:
-- duckdb sql commands to install and load the spatial extension
INSTALL spatial;
LOAD spatial;
Next, load the GeoJSON data from the previous step into a table using the ST_Read() function from the DuckDB Spatial Extension. This function uses the GDAL translator library behind-the-scenes to easily convert GeoJSON to a DuckDB table. This type of syntax is known as Create Table As Select (CTAS).
CREATE TABLE lockers AS SELECT * from ST_READ('berlin_parcel_lockers.geojson');
You can then see which columns have been created from the GeoJSON file by using the DESCRIBE TABLE
command:
DESCRIBE TABLE lockers;
By default, DuckDB only shows 40 rows of results. To show more, use the .maxrows
command:
.maxrows 100
Now you should be able to see all the columns. Because GeoJSON features may have different properties, DuckDB has created a column for every property that exists in one of the GeoJSON features. Most rows have null values for the majority of properties.
Now, create a new table that only contains core parcel locker information. This query also uses the Create Table As Select (CTAS) syntax that you used above.
CREATE TABLE lockers_clean AS (
SELECT
brand,
name,
opening_hours,
operator,
parcel_pickup,
parcel_mail_in,
ST_X(ST_Centroid(geom)) AS longitude,
ST_Y(ST_Centroid(geom)) AS latitude
FROM
lockers
);
The query above selects relevant columns and creates separate longitude and latitude columns. The geometry data is wrapped in the ST_Centroid() function because some of the parcel lockers are modeled as polygons in OSM rather than points. For the purpose of this analysis, we are just interested in the point of the parcel locker, rather than the polygon outline.
From here, copy the clean table to either a CSV or Parquet file:
-- duckdb sql query to copy data to a CSV file
COPY lockers_clean TO 'lockers_clean.csv' WITH (FORMAT CSV);
-- duckdb sql query to copy data to a Parquet file
COPY lockers_clean TO 'lockers_clean.parquet' WITH (FORMAT PARQUET, COMPRESSION ZSTD);
This data can then be pulled into various analysis tools, including Excel, Google Sheets, Tableau, or geospatial tools like Kepler.gl.
Parquet is an alternative table data format that is highly efficient to read and write. Parquet allows you to store larger amounts of data in smaller files. However, unlike CSV, it is not human-readable, meaning that you cannot open it in text editors or software like Excel.
Honeycomb supports both CSV and Parquet data. In Part 2 of this series (coming soon), I will show you how to create a high-perfomance interactive map of this data using Honeycomb Maps.
Extend your analysis
While the amount of data in this tutorial was relatively small (749 lockers), this technique can form the basis for much deeper analysis. OpenStreetMap provides data worldwide, although coverage and quality can vary.
- For logistics companies: This data could be combined with internal company data, for example the location of packages that could not be delivered successfully, to build a strategy for efficiently deploying new parcel lockers.
- For e-commerce businesses: This data could be used to evaluate shipping providers based on their locker coverage.
- For local governments: This data can inform zoning and land use strategy, for example whether to permit parcel locker locations on public land.
OpenStreetMap and Overture also provide a wealth of open data related to many other types of places, including bike parking, different types of stores, administrative boundaries, and much more. They can be a great resource for companies which are looking to incorporate external data without relying on a paid data provider.
If you’re working on these topics as well, it would be great to get in touch! You can find me on LinkedIn, or send me an email at carston@mapsanddata.com.