Optimizing for fast and scalable maps from data warehouses

Dashboards need data. But what's the best way to get this data from your datastore to a chart or map on your user's screen? How do you balance the competing demands of cost, user experience, and scalability? And what considerations need to be taken into account for map dashboards in particular?

I recently led the design and implementation of Honeycomb Maps' new SQL Parameters feature. In this blog post, I'll explore what motivated the design of this feature, and how it is implemented.

Goals: Fast, Up-to-Date, and Efficient

When users open a map, they expect it to contain the most recent data. When they're interacting with the map, they expect it to be snappy and responsive. And from a data team's perspective, data warehouse query costs also need to be taken into account.

There are a few common approaches to solving this problem.

Make the data warehouse larger

The frontend generates lots of queries in order to fetch data and aggregations for visuals. All of these queries go to a data warehouse — often hundreds per user session.
As the number of users grows, the number of queries grows as well, directly increasing data warehouse costs.
Because the frontend relies on the backend to produce the exact data that it needs, every unique filter/aggregation combination results in a new query that can't be cached.

Diagram showing a scaled-up data warehouse feeding a backend server, which feeds the user's computer — Scaling up the data warehouse to keep pace with query volume keeps the map fast, but costs grow directly with the number of users.

Build and serve data in tiles

To reduce strain on the data warehouse, you can build tilesets using tools like tippecanoe. Then you can serve these tiles efficiently from a blob store like S3, only fetching data that is needed for the user's current view.
This is fast and scales very well.
The downsides are that the tile-building process adds an additional step, meaning that data is often out-of-date. In addition, data needs to be pre-aggregated when baked into tiles, meaning that you must know exactly what types of aggregations the user will be interested in.

Diagram showing the data warehouse feeding a backend server that builds and serves tiles via a tile bucket to the user's computer — Pre-built tiles served from a blob store are fast and scalable, but the build step means data can be out-of-date and aggregations must be decided in advance.

Have the client handle interactive queries

Have the user grab all the data, and then process it on their computer.
This is how tools like ArcGIS, QGIS, or even Kepler.gl traditionally work — users get a complete data file, and then load it into the application.
This produces fast, highly responsive maps because there's no backend to communicate with — the user's computer handles the aggregation. It's also highly scalable: you can email a file with data to thousands of users and they can all look at it simultaneously, without any concerns.
The limitation here is primarily file size. After a certain amount of data, it becomes too big to effectively fit on users' machines, or to transfer to them over-the-wire.

Diagram showing the full dataset being transferred from the data warehouse through a backend server to the user's computer — Sending the full dataset to the client produces fast, scalable maps with no backend round-trips — but the dataset can only get so large before it's impractical to transfer.

Honeycomb Maps is designed primarily around the third option. It uses a specialized spatial database running in the browser to handle interactions with no network round-trips. Honeycomb fetches data once using a saved query when the map loads. The benefits are substantial — it can seamlessly scale to thousands of simultaneous users without missing a beat.

However, there is a key limitation: data sources were limited to 4GB (because Wasm uses 32-bit indexing). And besides that hard ceiling, moving 4GB (usually less, because of compression) to a client over-the-wire is slow.

Working with our customers, we realized that for almost all use cases, not all of the data was actually going to be visualized on the map. Instead, map users were typically focused on a specific section of the map that related to their work — like a city, a market area, or even a timestamp.

Pushing Filtering Down into the Data Warehouse

Loading a query might pull much more data than the user is actually interested in. This imposes additional load on the data warehouse, and also increases initial loading time.

The most logical solution is to perform a 'first cut' of data filtering at the data source itself (this is similar to the idea of predicate pushdown). This is designed to select only the rows that are relevant for the user — like a specific market, category, or timestamp. The solution here seems straightforward: just add a WHERE clause to the data source SQL.

Diagram showing a filtered dataset moving quickly from the data warehouse through a backend server to the user's computer — Filtering at the source sends only the rows the user cares about, reducing both warehouse load and initial loading time.

However, this led to another insight: the team creating the map is often not the same team that is using the map. For example, a central data team might be responsible for curating metrics and designing the map in Honeycomb, while distributed local market teams are the consumers of the map.

This means that the data team can't just include a static WHERE clause in the data source query, because it might need to change. And the teams using the software can't edit the query themselves, because they don't typically use SQL and don't have a full understanding of the underlying data model.

Parameterized SQL Queries Give Users Direct Control

To solve this, we've introduced SQL Parameters. SQL Parameters are placeholders that can be put into data source queries, and then exposed to the user as user interface components.

This lets data teams (map creators) maintain control over the queries that hit the warehouse, while opening up the opportunity for users to specify exactly what data they need.

When a creator is writing a query, they can include curly brackets {{ }} for parameter values. Parameters can have a default value, or be null. They can be text, numbers, or a date and time. If a parameter is null, the entire clause around the parameter can be omitted by using square brackets [[ ]].

The Honeycomb SQL query editor showing a parameterized WHERE clause and a SQL Parameters panel defining a state_parameter of type Text — Creators add parameters to a query with curly brackets, and wrap optional clauses in square brackets so they're omitted when a parameter is null.

After a parameter is created in the query editor, a user interface component can be added to the map which controls the parameter. This can be a free-form input, or constrained to certain ranges (for numbers) or certain values (for text).

When a user loads a map with a parameter, they can interact with the parameter control component to change the value, and then click 'Reload' to trigger a new query to the data warehouse.

A Honeycomb map of EV chargers in California with a parameter control component letting the user select a state from a drop-down and reload the map — Map users change a parameter through a simple control — here, picking a state from a drop-down — and click 'Reload' to fetch exactly the data they need.

For one of our customer's most popular maps, this will reduce the amount of data queried by 50x, resulting in reduced costs and drastically improved loading times.

I'm very excited about SQL Parameters because they unlock visualizations built on larger and more complex datasets, while maintaining the lightning-fast speed and scalability that's central to Honeycomb Maps.

If you have any questions, please feel free to reach out to me at carston@honeycombmaps.com or connect on LinkedIn.

- Carston