The term “HTAP” is the holy grail of database systems. It describes what every data engineer would love: Being able to do all your data Processing, no matter if it’s complex Analytics or fast-paced Transactional operations in a single Hybrid system.
Many system have tried to enable HTAP but have found that building a truly hybrid system is an impossible challenge. The keynote talk at Databricks Data + AI Summit 2026 highlighted that it is impossible to get a query on the analytical warehouse to execute in less than 1 second. As a new solution that solves this, the Databricks cofounder Reynold Xin announced “Databricks LTAP” and directly cited our paper on “Morsel-Driven Parallelism” as one of the “latest and coolest academic papers”:
With their LTAP offering, Databricks is actually able to offer sub-second transactional performance while maintaining its well-known analytical performance. That’s genuinely impressive! But if you look under the hood (e.g. by watching Databricks engineers talk about the technology behind LTAP), you will see that LTAP still uses the same classical separation between operational and analytical data. By their own description, LTAP does not run on one engine. It keeps a transactional engine and an analytical engine and unifies them at the storage layer. You can think of this being a really good, really fast zero-ETL system.
Zero-ETL is not enough, though. As long as you run two engines, you have two sources of truth, and there is a moment where data crosses from one to the other. Zero-ETL makes that gap small, but even if the name suggests otherwise, it cannot make it zero. The data still has to move from the system that wrote it to the system that reads it. When you really want HTAP, you really care about this gap being actually zero. Take fraud detection: a warehouse can flag suspicious activity only after the fact, but you want to catch it before the money moves. For zero read lag and a single source of truth, you have to re-think the entire system around one engine that runs both workloads natively.

It’s almost impossible to change an existing system designed for either transactions or analytics into a true hybrid system. Database researchers have known this for several decades already. We weren’t the first to attempt it, and we didn’t coin the term HTAP. Systems like SAP HANA and HyPer went after it before us but required your data to fit completely in main memory. Sadly, this didn’t work out. What made us reconsider that HTAP was back on the table was that fast SSDs became widely available. So ten years ago, we started Umbra, a research project with one goal: building a truly HTAP system. CedarDB is built on that foundation.
Since then, new developments in the database space focused only on analytics, leading to great analytical systems such as Databricks Lakehouse and Snowflake, and ClickHouse. It turns out the existing transactional systems, even regular PostgreSQL, scaled to even the most demanding AI workloads. What’s hard is making sure transactions and analytics don’t slow each other down when running at the same time.
To make this work well, you need to unify both the execution engine and the storage format without introducing new bottlenecks. For that, we built a hybrid column-row format as our data layer. It can support fast writes on hot data, automatically transforming between hot write-optimized and cold compressed data as needed, fully transparently as a single copy.
Not only that, we also built the foundations for fast analytical processing on modern hardware. You can find an overview of key techniques on our technology page, including morsel-driven parallelism, data-centric code generation, a cost-based optimizer with full subquery decorrelation, and a buffer manager designed to fully utilize fast SSDs.
Databricks LTAP is coming soon, CedarDB is in production today! Good to see the industry catching up to the problem. Come see what the answer looks like when it’s already running.
