PostgreSQL + ClickHouse: The Dual-Database Pattern That Made 70M-Row Dashboards Instant
There's a moment every developer hits when they realise their PostgreSQL database is no longer the right tool for every job. For me, that moment came at GovChime — a government contracts intelligence platform ingesting millions of records from SamGov API — when dashboard aggregation queries across 70M+ rows started timing out under real user load.
PostgreSQL is excellent. But it's an OLTP database — designed for transactional writes and precise row lookups, not for scanning tens of millions of rows and aggregating them on the fly. The question isn't whether to replace it; it's whether to augment it.
Why Not Just Add Indexes?
The instinct is to reach for indexes first. We had them. The problem wasn't missing indexes — it was the nature of the queries. Analytics dashboards need aggregations across the entire dataset: sum of contract values by agency, counts by NAICS code, trend lines over time. No index helps you when you need to touch 70 million rows.
The two real options were:
- Materialized views — precompute aggregations on a schedule, query the result
- OLAP database — a purpose-built columnar store optimised for exactly this workload
The Dual-Database Architecture
PostgreSQL remains the source of truth. It handles all writes — syncing from SamGov API via 3 SmartSync microservices — and all transactional reads where row-level precision matters (user queries, contract details, pagination).
ClickHouse sits alongside it as the analytics layer. It's a columnar store built for aggregation: data is stored column-by-column rather than row-by-row, which means summing or grouping across 70M rows is orders of magnitude faster than any row-based database.
The sync pipeline is one of the SmartSync microservices: it watches PostgreSQL for changes and syncs relevant data into ClickHouse asynchronously. Dashboards query ClickHouse. Everything else queries PostgreSQL.
Materialized Views: Two Levels
I built 50+ materialized views across both databases:
PostgreSQL materialized views — for mid-complexity queries that need fresh data but are too expensive to run live. These refresh on a schedule (every few minutes for high-traffic views, hourly for slower-moving data). They trade perfect freshness for consistent query performance. ClickHouse materialized views — for the heaviest aggregations. ClickHouse can update these incrementally as new data arrives, so they stay near-real-time without a full recompute. A dashboard showing contract awards by state across all 70M records loads in milliseconds from a ClickHouse MV.What the Numbers Looked Like
Before: dashboard queries taking 4-8 seconds, sometimes timing out under concurrent load.
After: the same queries hitting 30-80ms from ClickHouse materialized views. That's a 100-300% improvement depending on query complexity — dashboards felt instant.
The architecture is not free. You're maintaining two databases, a sync pipeline, and view refresh logic. The operational overhead is real. But for an analytics product where query performance directly impacts whether users trust and return to the platform, it's the right call.
Key Takeaways
- Don't fight your OLTP database's constraints — route around them. PostgreSQL is not an analytics database. Adding ClickHouse alongside it is not admitting defeat; it's using the right tool for the job.
- Materialized views are underused. Even in pure PostgreSQL setups, precomputing expensive aggregations on a schedule beats running them live on every request.
- Sync pipelines are the critical path. The quality of your analytics layer depends entirely on how reliable and fresh your sync is. Treat it as a first-class service.
- Design for your query patterns first. Know what your dashboards need to display before choosing how to store the data.
Why I Moved AI Out of NestJS and Into a Dedicated Python LangGraph Service
NextAgent Builders Are Changing How I Ship Code — Here's My Actual Workflow