Configuration

Configuration

In this part of the documentation, we will show you settings to control CedarDB’s resource usage, help you collect benchmark results, and highlight some of our expert configuration options. Usually, configuring these options is unnecessary, as CedarDB uses strategies to automatically choose the best settings for you.

Resource usage

Buffer Size

By default, CedarDB uses 50% of available system memory for our buffer manager. If you are running multiple applications on a single instance, you may want to reduce this amount. Unlike all other settings, this setting must be set during the startup phase of CedarDB. The amount of memory used can be set via an environment variable in your shell. The following shell command sets the available buffer size to 1 GB.

export BUFFERSIZE=1G

Degree of parallelism

CedarDB also uses all threads of the system for best performance. This is intended behaviour, but might generate high load on your machine. If you want to keep other applications responsive, consider starting CedarDB with nice. Alternatively, you can limit the number of threads CedarDB uses. Note, however, that this will limit the performance of CedarDB, since all queries will take advantage of the full parallelism of the system. This stems from our superior morsel-driven parallelization strategy. To change the parallelism, simply query the database with the following PostgreSQL-style set command

set debug.parallel=8;

Storage types

CedarDB implements several storage types. As you have seen in an earlier chapter of this documentation, you can specify the storage type when you create a table.

create table persons (
    id integer primary key generated always as identity,
    name text
) with (storage = columnar);

We support different storage types:

  • columnar is the best choice for hybrid and analytical workloads. It leverages the buffer manager and provides ACID guarantees.
  • paged uses a PAX-layout that leverages the buffer manager and provides ACID guarantees. It is optimized for OLTP workloads.
  • mapped is an in-memory only relation optimized for highest performance that should not be used in production.

Benchmarking

Repetition of queries

To validate performance, it is important to run queries multiple times. To repeat a query, you can either repeat the execution (e), the compilation (c), or both (a). To change the repetition mode, simply query this command.

set debug.repeatmode='a';

The number of repetitions can be set with the following command.

set debug.repeat=3;

Timeout

In the unlikely event of a long-running query, you may want to set a query time after which the query is terminated automatically. This can be accomplished with our timeout setting. This setting specifies the timeout in milliseconds, with 0 milliseconds disabling the timeout.

set debug.timeout=1000;

SQL tool

When you run our sql tool, we have additional commands that will help you benchmark our system.

General commands

For a list of possible commands, use our help command.

\?

If you want to specify a query that is stored in a file, you can load and run the query with our input command.

\i path/to/query.sql

Performance statistics commands

To record our performance statistics, you can create a CSV with our performance results. Just specify the output CSV with the following setting.

\record path/to/perf.csv

The output of the queries can be redirected to files (or /dev/null).

\o path/to/output

Advanced configuration

Although these settings are usually determined automatically, we will briefly discuss some of our advanced settings.

Compilation strategy

CedarDB is a compiling database, so each query is compiled into machine executable code. To do this, we first create our own intermediate representation (think LLVM-IR), which is then compiled into machine code. We provide several compilation backends:

  • Adaptive a: Adaptively chooses the best backend according to the current execution of the query and changes the execution over the duration of the query. It starts with the fastest latency backend and gradually transitions to faster but higher latency backends (default).
  • Interpreted i: Interprets the generated code with very low latency but achieves only low performance.
  • DirectEmit d: Directly generates executable machine code from our intermediate representation with low latency and good query execution performance.
  • Cheap c: Medium latency backend that compiles the intermediate representation with LLVM and few optimizations and good query execution performance.
  • Optimized o: High latency backend that compiles the intermediate representation with LLVM and many optimizations for superior execution performance.
set debug.compilationmode='a';

Multiway joins

In addition to binary joins, CedarDB also implements multiway joins. These joins are particularly useful for graph workloads. Because most workloads do not benefit from these types of joins, we use them conservatively. If your workload does benefit from such joins, you can direct the database system to use them more actively with the following options:

  • Cautious c: Conservatively use multiway joins only when these joins clearly outperform binary joins (default).
  • Eager e: Use multiway joins more aggressively when the estimated runtime is slightly improved.
  • Disabled d: Allows only binary joins.
set debug.multiway='c';