CedarDB Stands on the Shoulders of Giants

Umbra, the research foundation of CedarDB, is widely recognized across industry and academia. CedarDB's performance is unmatched as demonstrated on numerous research publications, conducted by the founders.

Features

Purpose-Built for Modern Data Processing

CedarDB redefines data processing with scalable SQL analytics and robust transaction capabilities. Its analytics-optimized, updatable data format enables fast query processing without transactional compromises. Engineered to leverage the full speed of NVMe SSDs and the bandwidth & flexibility of cloud storage, CedarDB delivers unparalleled performance.

Cloud-Native Architecture

Experience exceptional efficiency with CedarDB's S3 optimizations and seamless query live migration. Native support for Parquet and JSON data ensures a versatile and integrated data environment. Unlock the power of analytics by effortlessly combining streams and relations, redefining the way you derive insights with CedarDB.

Computational Database beyond SQL for ML & AI

CedarDB provides an efficient approach to user-defined functions. The system automatically parallelizes user functions, creates deep integration into generated code, and ensures morsel-driven execution. This makes it easy to extend CedarDB with advanced operations, e.g., gradient-descent and k-means algorithms, which optimizes today's ML & AI pipelines.

Fully Parallel Query Execution

Our query execution is designed to scale to hundreds of cores and take full advantage of parallel compute resources. In our morsel-driven parallelism, we use contention-free parallel algorithms that provide near-perfect scaling for large analytic queries and still low latency for short transactional queries.

Seamless Buffer Management with In-Memory Performance

CedarDB uses a low-overhead buffer manager based on LeanStore to support larger-than-memory workloads. This provides unparalleled performance for workloads on data that is cached in main memory, while degrading gracefully when the data does not fit into the RAM.

Data-Centric Code Generation

CedarDB generates optimized code, even for complex expressions in a custom intermediate representation that is optimized for low-latency query execution. Direct machine code generation starts query execution instantly, while long-running queries are automatically accelerated using the LLVM optimizing compiler.

No Transactional Compromises

CedarDB ensures transactional ACID properties through in-memory optimized multi-version concurrency control. We also use full-precision fixed-point arithmetic and unconditional overflow checking for always-correct results.

Versatile Join Implementations

CedarDB's data structures are designed for parallel processing that scales to hundreds of cores.Groupjoins enable efficient computation of aggregates, worst-case optimal joins handle complex queries on graph structured data, and range joins efficiently evaluate queries with conditions on location or time intervals.

Advanced Statistics

To precisely estimate result sizes and query costs, CedarDB combines sketches and sampling. Reservoir sampling allows always up-to-date statistics even during live inserts. In addition, numerical statistics can even estimate derived values of aggregates.

Complex SQL Queries

CedarDB supports the efficient execution of arbitrarily complex SQL queries in the PostgreSQL dialect. With our advanced optimization techniques, we achieve excellent performance for complex correlated subqueries, advanced window functions, and complex nested types such as JSON data.

Key-Publications

Umbra: A Disk-Based System with In-Memory Performance, CIDR 2020

Thomas Neumann and Michael Freitag | January 13, 2020

Full Text Cite

Exploiting Cloud Object Storage for High-Performance Analytics, VLDB 2023

Dominik Durner, Viktor Leis, and Thomas Neumann | August 31, 2023

Full Text Cite

On-Demand State Separation for Cloud Data Warehousing, VLDB 2022

Christian Winter, Jana Giceva, Thomas Neumann, and Alfons Kemper | September 5, 2022

Full Text Cite

User-Defined Operators: Efficiently Integrating Custom Algorithms into Modern Databases, VLDB 2022

Moritz Sichert and Thomas Neumann | September 5, 2022

Full Text Cite

Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems, VLDB 2022

Michael Freitag, Alfons Kemper, and Thomas Neumann | September 5, 2022

Full Text Cite

A Practical Approach to Groupjoin and Nested Aggregates, VLDB 2021

Philipp Fent and Thomas Neumann | August 19, 2021

Full Text Cite

JSON Tiles: Fast Analytics on Semi-Structured Data, SIGMOD 2021

Dominik Durner, Viktor Leis, and Thomas Neumann | June 25, 2021

Full Text Cite

Meet Me Halfway: Split Maintenance of Continuous Views, VLDB 2020

Christian Winter, Tobias Schmidt, Thomas Neumann, and Alfons Kemper | August 30, 2020

Full Text Cite

Interested? Contact us at

contact@cedardb.com

We'd love to hear from you!