Tutorial: Importing JSON Data into CedarDB
CedarDB natively supports storing JSON documents in tables and working with JSON in SQL.
Two data types: json
that stores the documents as text, and jsonb
which stores a binary representation to allow
efficient access to fields of the document.
Importing JSON
You can import data from a JSON Lines file:
{"name": "Cilian Murphy", "wikiLink": "https://en.wikipedia.org/wiki/Cillian_Murphy", "gender": "M", "birthdate": "1976-05-25"}
{"name": "Emily Blunt", "wikiLink": "https://en.wikipedia.org/wiki/Emily_Blunt", "gender": "F", "birthdate": "1983-02-23"}
{"name": "Michelle Yeoh", "wikiLink": "https://en.wikipedia.org/wiki/Michelle_Yeoh", "gender": "F", "birthdate": "1962-08-06"}
{"name": "Jürgen Prochnow", "wikiLink": "https://en.wikipedia.org/wiki/Jürgen_Prochnow", "gender": "M", "birthdate": "1941-06-10"}
Load the data into a table:
create table stars_json (star json);
copy stars_json from 'stars.json';
Now you can use the json documents in SQL queries:
select star->>'name' as name from stars_json where star->>'gender' = 'F';
name
---------------
Emily Blunt
Michelle Yeoh
(2 rows)
CedarDB supports the PostgreSQL syntax for JSON attribute
access with ->
and text with ->>
.
Additionally, the json_array_elements()
function to transforms JSON arrays to SQL tables.
#>
operator) and SQL/JSON path expressions
is planned for a future CedarDB release.Relationalize JSON
To efficiently execute queries on data, we recommend to transform JSON documents to relational tables. When storing data in CedarDB’s native storage format, it uses advanced statistics and columnar data storage for efficient execution.
For the previous example, you can relationalize by creating a table with explicit data types.
null
when a key is not present.
Depending on your JSON schema, you can also mark columns as not null
.create table stars (
id integer primary key generated always as identity,
name text,
wikiLink text,
gender char,
birthdate date
);
insert into stars(name, wikiLink, gender, birthdate)
select star->>'title', star->>'wikiLink', star->>'gender', star->>'birthdate' from stars_json;