AsOf Join
CedarDB supports AsOf joins for fuzzy merges, usually on time series. An AsOf join allows combining two tables on a “closest matching” attribute. This is useful to find the latest metric from one table at the time of another measurement with an ordering comparison.
Usage example:
-- Example schema
create table humidity(measure_time timestamp, value double);
create table temperature(measure_time timestamp, value double);
-- Get combined measurements, e.g., to calculate a "feels like" heat index
select h.measure_time, t.measure_time, h.value as humidity, t.value as temperature
from humidity h
asof join temperature t
on h.measure_time >= t.measure_time;
In this example, we get the latest temperature for a humidity measurement, even when the times do not line up precisely. Expressing this query in standard SQL is more involved and usually results in slower execution, see Alternative Implementations.
With AsOf joins, you can express such a fuzzy match with a time ordering.
As ordering condition, the AsOf join needs a condition with one of the following ordering comparisons:
>=
, <=
, <
, >
, and compare values from the left and right side of the join.
You can also add equality join conditions to logically group matches.
In contrast to regular joins, AsOf joins are asymmetric:
L asof join R
does not produce the same result as R asof join L
!
AsOf joins produce at most one tuple on the left side with the matching tuple from the right.
Tuples that do not find a matching value are filtered out.
Extended Example
Let’s now look at a more in-depth example using real market data from our NASDAQ example. Here, we’ll be using three tables:
orders
: a table containing all orders on the stock marketexecutions
: all actually executed ordersstocks
: static information about stock ticker symbols
In the following query, we calculate the time series with current market prices. Then we use those to find the current market price for each order, and can use that to determine if the order is likely to be executed soon.
with market_price as (
select e.stockid, e.timestamp, coalesce(e.price, o.price) price
from executions e, orders o
where e.orderid = o.orderid
)
select mp.stockid, o.orderid, o.side, mp.price as market_price, o.price as orderprice, mp.price - o.price as delta
from orders o
asof join market_price mp
on mp.stockid = o.stockid and mp.timestamp <= o.timestamp
join stocks s
on s.stockid = mp.stockid
where s.name = 'GME'
limit 10;
stockid | orderid | side | market_price | orderprice | delta
---------+---------+------+--------------+------------+---------
3405 | 3655250 | BUY | 5.8300 | 5.8400 | -0.0100
3405 | 3655254 | BUY | 5.8300 | 5.5000 | 0.3300
3405 | 3655258 | BUY | 5.8300 | 5.4000 | 0.4300
3405 | 3655262 | BUY | 5.8300 | 5.2500 | 0.5800
3405 | 3655266 | BUY | 5.8300 | 5.0800 | 0.7500
3405 | 3655270 | BUY | 5.8300 | 5.0100 | 0.8200
3405 | 3655274 | BUY | 5.8300 | 5.0000 | 0.8300
3405 | 3655278 | BUY | 5.8300 | 5.0000 | 0.8300
3405 | 3655282 | BUY | 5.8300 | 5.0000 | 0.8300
3405 | 3655286 | BUY | 5.8300 | 5.0000 | 0.8300
(10 rows)
Alternative Implementations
Open to show standard SQL alternatives for AsOf joins
-- AsOf join
select mp.stockid, o.orderid, o.side, mp.price as market_price, o.price as orderprice, mp.price - o.price as delta
from orders o
asof join market_price mp
on mp.stockid = o.stockid and mp.timestamp <= o.timestamp
join stocks s
on s.stockid = mp.stockid
where s.name = 'GME'
limit 10;
-- Subselect with limit 1
select mp.stockid, o.orderid, o.side, mp.price as market_price, o.price as orderprice, mp.price - o.price as delta
from orders o, (
select *
from market_price mp
where o.stockid = mp.stockid
and mp.timestamp <= o.timestamp
order by mp.timestamp desc
limit 1
) mp, stocks s
where s.stockid = mp.stockid
and s.name = 'GME';