Offset Considered Harmful or: The Surprising Complexity of Pagination in SQL
For most applications, the job of a database system is to ingest all your data and give you access to the data you care
about.
Unfortunately, sometimes this is a large amount of data, and a typical application should only carefully take small sips
from the fire hose of data.
Generally, controlling the amount of output data is hard, but SQL offers the special limit
syntax, which can guarantee
that the result is bounded, and you never turn on the fire hose’s full blast.
Unfortunately, the story does not end here, since there are many caveats.
For a limit
, you need a sort order, and getting that to be consistent needs a bit of careful engineering.
Additionally, while limit has wonderful use cases, its companion offset
gives false hope that paging through results
should be equally easy.
However, using offset
for pagination results in spurious duplicates and unexpectedly bad performance.
In the following, we explore the details behind result pagination, and why you should replace offset
with
predicate-based pagination with a where
condition.
Processing Large Results
In a typical application stack, the job of a database system is to store large amounts of data and provide access to relevant pieces of the data through queries. One catch is that even innocent and seemingly harmless database queries can return large results, and as a consequence, crash or degrade the whole system. Unfortunately, applications interfacing with the DB, thus, need to deal with large query results.
Large result sizes are undesirable, since they use more resources in the application and reduce stability: In the best case, your application becomes unresponsive and uses more resources than necessary, but a too large result can also overwhelm your application and lead to crashes. This is particularly problematic because most queries have no upper limit on the result size. Take, for example, the following SQL query:
select * from sales where seller_id = 42;
The query might look reasonable, and it even has a highly selective constraint.
Depending on the data, this might return just a few results, and you can display them as is.
Or, if you only have one sales person, this just returns the whole database and OOMs your app.
Therefore, a good precaution to avoid overly large results is to always add a limit
clause, which gives hard bounds to
your result size.
The fundamental problem here is that for an arbitrary query, you can’t tell how many rows the result of the query will have. Database systems are built for large data, with many algorithms only streaming data through memory, which, unfortunately, also means that the DB will happily flood unsuspecting clients with data. Many applications are not designed to work with large data, and many runtimes even limit memory to around 4 GiB.
Result Pagination
As a safety measure to avoid drinking straight from the fire hose, it is a good idea to bound the result size to a fixed maximum. There are a few exceptions, e.g., when you bind a unique key, and you can guarantee that results do not explode. The standard way to achieve this is result pagination, where you only load results in chunks of, e.g., 100 rows. For this, you can use a limit clause like so:
select * from sales where seller_id = 42 limit 100;
This solves the problem of exploding result sizes, but unfortunately, large results are still problematic:
Say, the query has 200 results.
Which of them are included, and which are not?
You might be surprised that this query is underspecified and allows the database system to return an arbitrary selection of 100 rows.
It is very like that if you first look at the result, you get some rows, then you hit refresh, and now you see completely different
results.
To avoid flip-flopping results, you need to combine a limit
with an order by
:
select * from sales where seller_id = 42 order by sales_date limit 100;
This now works for most cases, but still has two subtle problems:
If two rows are tied in the sort order around the cut-off, which one is included in the result?
And, if you want to see the next page of results, how do you get the next set of values?
In the following, we first briefly look at the problem of tied values, before we get to the main point of this post that
a simple offset
is unsuitable for paging through results.
The Problem with Ties
The problem of unordered ties around the cut-off point of 100 tuples can be subtle and not immediately obvious. Usually, results are fairly stable, and most columns tend to have many distinct values, e.g., timestamps, where ties are uncommon anyway. Therefore, unstable order specifications often slip by unnoticed, and often still end up in production, which then results in unpredictable results around the page break. However, in low-cardinality columns, ties are much more common which causes problems much more quickly.
Even for seemingly high-cardinality columns like names, there can be quite unpleasant consequences of improperly handled ties. I distinctly remember one example from when I taught undergraduate database courses at TUM. We had several hundred students signed up for the exam, and we needed to assign them rooms beforehand, so they know which part of campus to head to. At the time, the tool that assigned students to rooms used student names for easily recognizable assignments:
Students | Room |
---|---|
Ada - Müller | Lecture Hall 1 |
Müller - Zuse | Lecture Hall 2 |
Do you spot the problem? At the time, there were four or five students with last name “Müller” (i.e., “Miller”, one of the most common German surnames). Having an ambiguous room assignment for those students was obviously not ideal, and we really didn’t want to send a bunch of nervous students halfway across campus five minutes before their exam. After some discussion, we managed to switch the room assignment to use student ID numbers instead. While this is somewhat more inconvenient for students (Who remembers all of their countless ID numbers?), the assignment now can’t have any ties.
Dynamic result pagination for database queries with limit
has a similar problem.
With an ambiguous ordering condition around the fold of two pages, items might shuffle around between executions of the
query.
So, you might end up seeing one item A on page 1 and 2, but item B appears on neither.
However, ordering only by a unique ID is not a generally desirable solution, because sorting by arbitrary columns is
often a business requirement:
When looking at your order history online, you expect them to be newest to oldest, not an arbitrary order with year old
orders mixed between recent ones.
In these cases, you can include an ID as a last “tiebreaker” column, i.e., order by timestamp, id
.
This way, you get a guaranteed stable sorting, with no flaky ordering.
The Problem with Offset
So, when your first page of results doesn’t cut it, how do you get the next batch of results?
A convenient, but unfortunately flawed way is to use limit
’s evil brother offset
.
So, a limit 10 offset 10
for the second page, an offset 20
for the third, etc.
Using offsets like this can still produce duplicate results, and is pretty bad for performance, in the worst case
scanning the whole table for each page of ten elements.
Duplicate Results
In the previous section, we already saw that we need a stable sort order for consistent results.
The problems, however, do not stop there:
You can still see results twice when new items get inserted concurrently.
For example, when you order by date desc
, then new entries always are on the first page.
Now when you take some time looking at the first page before switching to the second, a newly inserted row shifts the
page break one to the back.
Now you see elements twice!
The last element on the first page is the first element on the second page.
But with a deterministic sort order, you at least can’t “lose” elements that you never see.
Bad Performance
Offset has another downside:
It needs to sort and then “skip” all the offset values.
This can sometimes be surprising, since we still only output the same number of tuples.
Most simple queries only scale with the amount of data produced, and with a limit
clause, this is a hard upper bound.
So a limit 100
query takes roughly twice the time1 as a limit 50
.
Specifying an offset
breaks this assumption, since it scales with limit + offset
instead of only the limit.
When paginating a large data set, this becomes surprisingly costly.
While a limit
usually stays constant, an offset
can grow to be the size of your data.
When paginating with a large offset, say you want to see page 1000, which shows the results starting at an offset of
100,000, this can quickly exceed in-memory capacity for efficient sorting.
Even with an already sorted B-Tree index, we can’t efficiently skip rows, since the branches of a B-Tree do not contain
a fixed number of tuples, and we need to look at each leaf page.
Root Cause Analysis
The root causes of these problems are not inherent to paging, however, but to the specific
implementation using offset
.
The goal of pagination is a stable view of your data in small chunks.
Traditionally, database systems would support this with transactions, but this is a very bad idea since they make
connections stateful.
In contrast, offset pagination only requires state on the client, i.e., which page it wants to view.
So, can we build a stable view without offset
?
When we look at Wikipedia or
well respected online guides, they recommend
using a positional filter with a where
clause.
Let’s look at an example how this could look like:
Name | ID |
---|---|
Aho | 314 |
Bayer | 159 |
… | |
Neumann | 123 |
Neumann | 456 |
Name | ID |
---|---|
Neumann | 789 |
Page | 256 |
… |
Instead of offset 10
to get to the second page, we can use a where (name, id) > ('Neumann', 456)
to skip the first
10 tuples.
So, does it help?
For the duplicate problem with concurrent inserts, the where
condition ensures that the last element that we saw is
never part of the next page.
So, we will never see duplicates.
However, our second page of results might technically start at row 12, but the overall user experience is much more
stable and consistent.
For performance, the difference is also night-and-day.
When we have an index, fetching page 1000 now takes the same amount of time as the first page.
Even when the data needs to be explicitly sorted, the filter condition keeps the overall state minimal.
Additionally, many modern systems, including CedarDB, implement
lightweight zone maps, which keep track of min/max bounds for
columns.
A where
condition that excludes many tuples will then use the zone maps to only look at relevant data, which makes the
query fast even without an explicit index.
Summary
So what should you take away for your own paginating applications? My personal recommendations for building pagination in your application are:
- Use
limit
liberally. - Include IDs in
order by
as tiebreaker. - Stop using
offset
.
While this should be a core competence of ORMs, unfortunately, few ORMs get this right from the get-go.
Almost all ORMs default to offset
pagination, and only the more advanced ones like
Laravel or
Drizzle provide guidance implementing cursor-based
pagination that generates where
conditions.
Confusingly, this doesn’t use the stateful standard SQL cursors.
So if you are building ORMs in your spare time, you should probably consider my advice as well.
This is, of course, an over generalization and assumes you don’t use complex joins or aggregations. And, technically, there’s an additional logarithmic term, so it’s a bit more, but that still “feels like” 2x. ↩︎