Announcing Honeycomb: Towards modern geospatial analysis
Geospatial analysis is any type of analysis which deals with location:
- How many people live in this neighborhood?
- What is the average income of people in this country?
- How many competitor stores are within 1km of mine?
Geospatial data analysis, despite its ubiquity and relevance to almost every business, has largely developed on a separate track from mainstream data analysis tooling. It requires specialized tools (PostGIS, ESRI, QGIS, Carto) and knowledge (DE-9IM Model, Spatial SQL) to work with. This has greatly limited the number of geospatial problems which can be solved.
In the past few years, there has been a movement from desktop-based GIS tools like QGIS and ESRI towards cloud-based tools like Carto and Unfolded. This follows an explosion in the popularity of cloud-based databases and the ‘modern data stack’. These tools are extremely powerful and allow analysis of once-unimaginable amounts of data from within a web browser. I’ve been lucky to lead projects deploying tools like these at large companies.
Modern location questions
However, in my experience, many geospatial problems are ad-hoc analysis which require combining different data sources and playing around with the relationships between them. These are questions like:
- Where are the optimal places to install parking docks for shared bikes in the city of Berlin?
- How would new school boundaries in Minneapolis affect the social, economic, and racial diversity of different schools?
- Are there pockets of demand for a local grocery store which are not currently supplied?
To answer these problems, an ideal geospatial analysis tool should:
-
Be intuitive and approachable for people without a geospatial background. Most of the time the person asking the question is a domain expert, but not a geospatial expert. Being able to answer questions themselves instead of relying on a geospatial analyst greatly increases the quality and decreases time. Tools like Google My Maps are great at this.
-
Allow for interactive, iterative ‘what-if’ analysis, rather than a pure one-way dashboard view. Excel shines at this.
-
Limit architectural complexity - for security and cost. Kepler shines at this.
Existing geospatial tools like QGIS and ESRI can answer these questions, but they’re not approachable for the average user. New cloud-based tools like Carto and Unfolded thrive at displaying large amounts of data in a predefined way (typically a map or analysis created by someone with a geo background). Since the data being displayed is largely pre-defined (because there is typically not a semantic layer to map geospatial databases to dynamic user input), there is a limited amount of dynamic analysis that end users can do themselves, without involving someone more technical.
Honeycomb: an in-browser tool for powerful spatial analysis.
Intuitive and approachable
Honeycomb is completely browser-based. This means that there is nothing to install and it can run on any modern web browser. The interface is completely point-and-click, with analysis happening in real-time. Behind the scenes, there is powerful geospatial analysis happening (areal interpolation, metric aggregations over regions, buffering), but this is abstracted behind a simple UI.
Interactive, question-driven analysis
Honeycomb uses the H3 Hexagonal Index System to harmonize data from many sources (flat files, US Census, Eurostat to come). Once data is brought into the tool, it is converted (right now using areal weighted interpolation) to values at different H3 indexes. This means that disparate data sources (internal company data, public US Census/ Eurostat demographic data, and 3rd party datasets) can be directly joined and used side-by-side.
Taking this one step further, Honeycomb supports ‘derived layers’, which are new data layers which are based on the values of other layers. For example, a dynamic layer named ‘Coffee Shops Per Capita’ could be easily built by dividing the number of coffee shops in each H3 hexagon (from OpenStreetMap) with the total population in the same H3 hexagon (from US Census data).
Data doesn’t leave your machine
Honeycomb uses a modern OLAP (online analytical processing) database called DuckDB running in the browser to deliver fast results without using expensive cloud resources. OLAP databases are designed to quickly compute the results to queries which require aggregations on many rows of data, compared to OLTP (online transactional processing) databases, which are designed to handle many atomic transactions (inserting new rows, querying single rows). Because H3 indexes can be represented as integers, databases can perform aggregations on this data extremely efficiently.
Processing data on the client-side also has significant security benefits. Because data is not sent over the internet, an entire class of security threats (misconfigured authentication, broken encryption) is not relevant. This is especially important because location data can often be considered personally identifiable.