November 19, 2024

Offset Considered Harmful or: The Surprising Complexity of Pagination in SQL

Have you ever wondered why you sometimes see duplicate results when clicking on the second page of a website? In this blog post, we explore techniques for result pagination, how they impact the work necessary to compute the results, and why using the SQL offset keyword for it is not a good idea.

Philipp Fent
Philipp Fent
Philipp Fent
Philipp Fent
Offset Considered Harmful or: The Surprising Complexity of Pagination in SQL

Offset Considered Harmful or: The Surprising Complexity of Pagination in SQL

For most applications, the job of a database system is to ingest all your data and give you access to the data you care about. Unfortunately, sometimes this is a large amount of data, and a typical application should only carefully take small sips from the fire hose of data. Generally, controlling the amount of output data is hard, but SQL offers the special limit syntax, which can guarantee that the result is bounded, and you never turn on the fire hose’s full blast. Unfortunately, the story does not end here, since there are many caveats.

For a limit, you need a sort order, and getting that to be consistent needs a bit of careful engineering. Additionally, while limit has wonderful use cases, its companion offset gives false hope that paging through results should be equally easy. However, using offset for pagination results in spurious duplicates and unexpectedly bad performance. In the following, we explore the details behind result pagination, and why you should replace offset with predicate-based pagination with a where condition.

Processing Large Results

In a typical application stack, the job of a database system is to store large amounts of data and provide access to relevant pieces of the data through queries. One catch is that even innocent and seemingly harmless database queries can return large results, and as a consequence, crash or degrade the whole system. Unfortunately, applications interfacing with the DB, thus, need to deal with large query results.

Large result sizes are undesirable, since they use more resources in the application and reduce stability: In the best case, your application becomes unresponsive and uses more resources than necessary, but a too large result can also overwhelm your application and lead to crashes. This is particularly problematic because most queries have no upper limit on the result size. Take, for example, the following SQL query:

select * from sales where seller_id = 42;

The query might look reasonable, and it even has a highly selective constraint. Depending on the data, this might return just a few results, and you can display them as is. Or, if you only have one sales person, this just returns the whole database and OOMs your app. Therefore, a good precaution to avoid overly large results is to always add a limit clause, which gives hard bounds to your result size.

The fundamental problem here is that for an arbitrary query, you can’t tell how many rows the result of the query will have. Database systems are built for large data, with many algorithms only streaming data through memory, which, unfortunately, also means that the DB will happily flood unsuspecting clients with data. Many applications are not designed to work with large data, and many runtimes even limit memory to around 4 GiB.

Result Pagination

As a safety measure to avoid drinking straight from the fire hose, it is a good idea to bound the result size to a fixed maximum. There are a few exceptions, e.g., when you bind a unique key, and you can guarantee that results do not explode. The standard way to achieve this is result pagination, where you only load results in chunks of, e.g., 100 rows. For this, you can use a limit clause like so:

select * from sales where seller_id = 42 limit 100;

This solves the problem of exploding result sizes, but unfortunately, large results are still problematic: Say, the query has 200 results. Which of them are included, and which are not? You might be surprised that this query is underspecified and allows the database system to return an arbitrary selection of 100 rows. It is very like that if you first look at the result, you get some rows, then you hit refresh, and now you see completely different results. To avoid flip-flopping results, you need to combine a limit with an order by:

select * from sales where seller_id = 42 order by sales_date limit 100;

This now works for most cases, but still has two subtle problems: If two rows are tied in the sort order around the cut-off, which one is included in the result? And, if you want to see the next page of results, how do you get the next set of values? In the following, we first briefly look at the problem of tied values, before we get to the main point of this post that a simple offset is unsuitable for paging through results.

The Problem with Ties

The problem of unordered ties around the cut-off point of 100 tuples can be subtle and not immediately obvious. Usually, results are fairly stable, and most columns tend to have many distinct values, e.g., timestamps, where ties are uncommon anyway. Therefore, unstable order specifications often slip by unnoticed, and often still end up in production, which then results in unpredictable results around the page break. However, in low-cardinality columns, ties are much more common which causes problems much more quickly.

Even for seemingly high-cardinality columns like names, there can be quite unpleasant consequences of improperly handled ties. I distinctly remember one example from when I taught undergraduate database courses at TUM. We had several hundred students signed up for the exam, and we needed to assign them rooms beforehand, so they know which part of campus to head to. At the time, the tool that assigned students to rooms used student names for easily recognizable assignments:

StudentsRoom
Ada - MüllerLecture Hall 1
Müller - ZuseLecture Hall 2
Exam room assignments based on last names

Do you spot the problem? At the time, there were four or five students with last name “Müller” (i.e., “Miller”, one of the most common German surnames). Having an ambiguous room assignment for those students was obviously not ideal, and we really didn’t want to send a bunch of nervous students halfway across campus five minutes before their exam. After some discussion, we managed to switch the room assignment to use student ID numbers instead. While this is somewhat more inconvenient for students (Who remembers all of their countless ID numbers?), the assignment now can’t have any ties.

Dynamic result pagination for database queries with limit has a similar problem. With an ambiguous ordering condition around the fold of two pages, items might shuffle around between executions of the query. So, you might end up seeing one item A on page 1 and 2, but item B appears on neither. However, ordering only by a unique ID is not a generally desirable solution, because sorting by arbitrary columns is often a business requirement: When looking at your order history online, you expect them to be newest to oldest, not an arbitrary order with year old orders mixed between recent ones.

In these cases, you can include an ID as a last “tiebreaker” column, i.e., order by timestamp, id. This way, you get a guaranteed stable sorting, with no flaky ordering.

The Problem with Offset

So, when your first page of results doesn’t cut it, how do you get the next batch of results? A convenient, but unfortunately flawed way is to use limit’s evil brother offset. So, a limit 10 offset 10 for the second page, an offset 20 for the third, etc. Using offsets like this can still produce duplicate results, and is pretty bad for performance, in the worst case scanning the whole table for each page of ten elements.

Duplicate Results

In the previous section, we already saw that we need a stable sort order for consistent results. The problems, however, do not stop there: You can still see results twice when new items get inserted concurrently. For example, when you order by date desc, then new entries always are on the first page. Now when you take some time looking at the first page before switching to the second, a newly inserted row shifts the page break one to the back. Now you see elements twice! The last element on the first page is the first element on the second page. But with a deterministic sort order, you at least can’t “lose” elements that you never see.

Bad Performance

Offset has another downside: It needs to sort and then “skip” all the offset values. This can sometimes be surprising, since we still only output the same number of tuples. Most simple queries only scale with the amount of data produced, and with a limit clause, this is a hard upper bound. So a limit 100 query takes roughly twice the time1 as a limit 50. Specifying an offset breaks this assumption, since it scales with limit + offset instead of only the limit.

When paginating a large data set, this becomes surprisingly costly. While a limit usually stays constant, an offset can grow to be the size of your data. When paginating with a large offset, say you want to see page 1000, which shows the results starting at an offset of 100,000, this can quickly exceed in-memory capacity for efficient sorting. Even with an already sorted B-Tree index, we can’t efficiently skip rows, since the branches of a B-Tree do not contain a fixed number of tuples, and we need to look at each leaf page.

Root Cause Analysis

The root causes of these problems are not inherent to paging, however, but to the specific implementation using offset. The goal of pagination is a stable view of your data in small chunks. Traditionally, database systems would support this with transactions, but this is a very bad idea since they make connections stateful. In contrast, offset pagination only requires state on the client, i.e., which page it wants to view.

So, can we build a stable view without offset? When we look at Wikipedia or well respected online guides, they recommend using a positional filter with a where clause. Let’s look at an example how this could look like:

NameID
Aho314
Bayer159
Neumann123
Neumann456
NameID
Neumann789
Page256
Example with 10 elements per page

Instead of offset 10 to get to the second page, we can use a where (name, id) > ('Neumann', 456) to skip the first 10 tuples.

So, does it help? For the duplicate problem with concurrent inserts, the where condition ensures that the last element that we saw is never part of the next page. So, we will never see duplicates. However, our second page of results might technically start at row 12, but the overall user experience is much more stable and consistent.

For performance, the difference is also night-and-day. When we have an index, fetching page 1000 now takes the same amount of time as the first page. Even when the data needs to be explicitly sorted, the filter condition keeps the overall state minimal. Additionally, many modern systems, including CedarDB, implement lightweight zone maps, which keep track of min/max bounds for columns. A where condition that excludes many tuples will then use the zone maps to only look at relevant data, which makes the query fast even without an explicit index.

Summary

So what should you take away for your own paginating applications? My personal recommendations for building pagination in your application are:

  1. Use limit liberally.
  2. Include IDs in order by as tiebreaker.
  3. Stop using offset.

While this should be a core competence of ORMs, unfortunately, few ORMs get this right from the get-go. Almost all ORMs default to offset pagination, and only the more advanced ones like Laravel or Drizzle provide guidance implementing cursor-based pagination that generates where conditions. Confusingly, this doesn’t use the stateful standard SQL cursors. So if you are building ORMs in your spare time, you should probably consider my advice as well.


  1. This is, of course, an over generalization and assumes you don’t use complex joins or aggregations. And, technically, there’s an additional logarithmic term, so it’s a bit more, but that still “feels like” 2x. ↩︎


Supported by
BMWK logo
EU logo
Das Projekt LunaBase wird im Rahmen des EXIST-Programms durch das Bundesministerium für Wirtschaft und Klimaschutz und den Europäischen Sozialfonds gefördert.