Importing from PostgreSQL

Importing Data from PostgreSQL into CedarDB

In this section, you will learn how to seamlessly transfer data from PostgreSQL to CedarDB.

Dump the schema from Postgres

Make sure your Postgres instance is running. Then use the Tool pg_dump (probably supplied together with PostgreSQL by your package manager):

pg_dump --schema-only postgres > schema.sql
ℹ️
postgres is the default database into which PostgreSQL inserts new tables. Replace with other database in above command if you want to export the tables of a different database.

Adapt the dumped schema

CedarDB does not yet support some settings PostgreSQL tries to set. Remove them from the schema dump for now by running the following three sed commands:

sed -i.bak 's/^SET.*$//g' schema.sql
sed -i.bak 's/.*set_config.*//g' schema.sql
sed -i.bak 's/^ALTER TABLE .* OWNER TO .*//g' schema.sql

Remove unsupported data types

CedarDB doesn’t support some data types yet, especially no auto generated series. For now remove them, by manually editing schema.sql.

For example, the statement

create table x(id integer generated always as identity);

generates the following lines in the dump file:

CREATE TABLE public.x (
    id integer NOT NULL
);

--
-- Name: x_id_seq; Type: SEQUENCE; Schema: public; Owner: postgres
--

ALTER TABLE public.x ALTER COLUMN id ADD GENERATED ALWAYS AS IDENTITY (
    SEQUENCE NAME public.x_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1
);

In this case, remove the alter table statement.

Dump the data out of PostgreSQL into CSV files

Connect with PostgreSQL via psql.

Execute the following statement against Postgres for each table you want to export to CedarDB

\copy {tablename} TO 'your/path/{tablename}.csv' DELIMITER '|' CSV NULL '';
ℹ️
The NULL parameter specifies into which string null values should be serialized. If your data set contains empty strings, choose a different value.

Import data from your freshly dumped files into CedarDB

Now make sure that CedarDB is running. Either let it run in parallel to Postgres and use a different port, or shut down Postgres and then start CedarDB. For the following commands, we assume CedarDB listens at port 5432. If you’re using another port, please change the commands accordingly (via the argument -p {Portnumber}).

Import the schema you’ve exported from PostgreSQL and modified earlier

psql -h localhost -U postgres < schema.sql

If you get some error messages in the server log, that’s okay for now, as long as your tables are created (psql answers with CREATE TABLE).

Connect with CedarDB via psql, e.g. via

psql -h localhost -U postgres

Execute the following statement against CedarDB for each table you exported in the previous step

copy {tablename} from 'your/path/{tablename}.csv' DELIMITER '|' CSV NULL '';

Note that the path is relative to the server, i.e., if you run CedarDB inside the docker container, the location where the csv files resist must be mapped as docker volume.

If you want the path to be relative to the client, precede the command with a backslash:

\copy {tablename} from 'your/path/{tablename}.csv' DELIMITER '|' CSV NULL '';

Note that this incurs some network overhead as the data is sent via the PostgreSQL wire protocol over the psql connection.

The csv import is currently single-threaded, as CedarDB has to correctly handle newlines and escapes. If you are sure that your strings don’t contain newlines and don’t contain the delimiter, you can instead import in text mode which is multi-threaded and thus much faster:

copy {tablename} from 'your/path/{tablename}.csv' with(format text, delimiter '|', null '');
⚠️
Multithreaded import does not yet work when using a backslash in front of copy (i.e. when importing relative to the client).

Your data is now successfully imported into CedarDB!