Why H3 is the Missing Piece for Integrating Location Data into BI Systems
H3 doesn't solve every geospatial problem, but for point-based events (common in the mobility & last-mile delivery industries), it allows location data to be easily integrated and used within existing business intelligence (BI) systems and processes.
What is H3?
- H3 is a discrete global grid system (DGGS) which covers the surface of the earth in hexagons of varying sizes (known as resolutions).
- Each hexagon has a unique ID, such as
8a283082a677fff
. - Hexagons are used because they are one of the only shapes that can be tiled without gaps, with the added benefit that each is an equal distance from the center of its neighboring hexagons.
More foundational information about H3 can be found in the official documentation at h3geo.org
H3 is very powerful for analysing data for 3 reasons:
- H3 effectively discretizes spatial information
- H3 can be processed efficiently by existing query systems
- H3 is being adopted quickly by BI vendors
H3 effectively groups points into bins
H3 effectively discretizes continuous spatial data. Broadly, data in a traditional relational data warehouse can be thought of as fitting into two categories: dimensions and measures. Dimensions are categorical fields like Vehicle Type
or Is Loyalty Member
. Measures are quantitative (numeric) fields like Price Paid
or Parcel Weight
.
Much of the traditional BI process involves taking raw data from business operations and then grouping by dimensions and aggregating by measures to produce various metrics that present an easy-to-understand overview of the business for decision-makers. Most BI tools on the market today are built around this process - from ‘Pivot Tables’ in Excel to dashboarding tools like Tableau and PowerBI.
When location data sits in a data warehouse, it is typically handled in two ways: as latitude and longitude columns that are represented as measures, or as text with a location meaning, such as a city name column (‘Paris’). Using text with a location meaning requires that locations are defined ahead of time and don’t change (what is the definition of ‘Paris’), limits the specificity of analysis, and may suffer from the modifiable areal unit problem.
Storing latitude and longitude coordinates as measures may be tempting because they are numbers, however, they do not provide any value when aggregated into metrics - ‘the average latitude of high-spending customers’ is not a very actionable business metric. A better approach is to convert latitude and longitude coordinates to an H3 index and treat this as a dimension within the data warehouse. The process of converting latitude and longitude coordinates to H3 can be thought of as ‘binning’ individual points into hexagon-shaped ‘bins’.
The ‘binning’ method is commonly used for creating dimensions from continuous variables, such as separating customers into quartiles based on their household income. Latitude and longitude could be binned in many different ways, such as by truncating their coordinates. However, using H3 for binning provides distinct advantages over other methods because H3 cells are equidistant from each other and have similar areas (within a region).
Once point data has been binned to H3 cells, H3 cell can be used as a dimension for which metrics are calculated from measures. For example, the total sales for each H3 bin is straightforward to calculate:
SELECT
h3_index,
sum(order_total) as total_sales
FROM order_data
GROUP BY h3_index
H3 is extremely efficient
H3 is uniquely suited to integrate location data into data warehouses because it can be processed very efficiently.
H3 cells are most commonly used as hexadecimal strings (8a283082a677fff
) but can also be represented as integers (622203769624952831
).
Integers are a primitive data type for most database systems, which have been highly optimized to store and compute them efficiently.
Contrast this with more complicated spatial data formats like GeoJSON or WKT, which require more space to store & complexity to work with.
In practice, computing statistics for H3 cells will be much faster and utilize far fewer computational resources than using traditional spatial functions like ST_CONTAINS()
. This makes it feasible to calculate metrics for larger areas, or update metrics more frequently. (In the next article we'll build a data warehouse with h3)
H3 is gaining broad support
Largely due to the reasons outlined above, H3 is quickly becoming a de-facto standard for incorporating spatial data into data warehouses. This is leading to a surge in support for H3 functions within BI tools.
The following data warehouses have implemented support for H3 functions within the past 2 years:
- Snowflake
- Databricks
- BigQuery (through an extension)
Additionally, H3 libraries are available for the most common programming languages, including Python, C/C++, Go, Javascript, and Rust.
Up-and-coming database systems like ClickHouse and DuckDB also have support for H3. This broad support for H3 is leading to increased adoption among businesses, driving further demand for H3 support in other tools.
Next: Building H3 into a data platform
H3 provides a way to cleanly and efficiency integrate location data with existing business intelligence systems and processes.
The next article is more concrete: a step-by-step guide to implementing an end-to-end business intelligence function with h3 at a logistics company.