Efficiently Manage Large Datasets With Jakarta Data’s Pagination

Thursday, December 19, 2024 - 07:00

The release of the Jakarta Data specification v1.0 alongside the release of Jakarta EE 11 offers many valuable capabilities to users of NoSQL databases. It has helped streamline and develop database operations by simplifying access and manipulation of relational and NoSQL databases.

While we already discussed introductory topics in a previous issue of this newsletter, the specification has many more advanced features that underlie much of its power and flexibility. One such feature is pagination, which is critical for managing large datasets efficiently. This article explains the pagination feature and provides a glimpse into the future of Jakarta Data, spotlighting upcoming features and ongoing discussions.

Jakarta Data Supports Both Cursor and Offset Pagination

Pagination is an essential capability for modern applications, giving users the ability to navigate vast datasets without overloading the system. The two most common methods today are offset pagination and cursor-based pagination. Jakarta Data supports both, and each has its own respective strengths and limitations.

1. Offset Pagination Straightforward But Limited

Offset pagination is popular for its simplicity and ease-of-use. It breaks up data into pages by using simple numerical offsets, hence the name. It does this by using LIMIT and OFFSET parameters in database queries.

In practice, the process might look something like this:

@Repository
public interface BeerRepository extends BasicRepository<Beer, String> { }

Pageable page = Pageable.ofPage(1).sortBy(Sort.desc("style"));
Page<Beer> page1 = repository.findAll(page);
System.out.println("The first page:");
page1.forEach(System.out::println);

Pageable secondPage = page.next();
Page<Beer> page2 = repository.findAll(secondPage);
System.out.println("The second page:");
page2.forEach(System.out::println);

Because this method is so simple and easy to implement, it’s popular and well-suited for use in smaller datasets and in ones where data does not change dynamically. However, it tends to suffer from slower performance in larger datasets, due to the cost of skipping records, and is susceptible to inconsistencies (such as phantom reads) when data changes dynamically.

2. Cursor-Based Pagination More Complex But Consistent

Cursor-based pagination, rather than relying on simple offsets to break up pages, uses a reference — in this case, a cursor — to sequentially retrieve records.

In practice, the process might look something like this:

@Repository
public interface FruitRepository extends BasicRepository<Fruit, String> {

    @Find
    CursoredPage<Fruit> cursor(PageRequest pageRequest, Sort<Fruit> order);

}

var pageRequest = PageRequest.ofSize(size).beforeCursor(PageRequest.Cursor.forKey(before));
var page = fruitRepository.cursor(pageRequest, DESC);

This offers several advantages over offset pagination. By avoiding the overhead associated with skipped records, cursor-based pagination offers high performance even on large datasets. It also provides consistent navigation, even with frequent updates to the dataset.

However, implementing this pagination method is significantly more complex, especially when multiple fields are used in the cursor. And because it is only capable of sequential navigation and cannot be used for random access.

Jakarta Data Enhances Pagination With Type-Safe Queries

Not only does Jakarta Data support both of these pagination methods, it also introduces advanced features of its own to make them more robust and useful. Specifically, it introduces a static metamodel for type-safe queries, simplifying implementation, reducing the errors caused by and incidence of magic strings, and making it easier to maintain code quality.

Here’s how that looks in practice:

@Entity
public class Product {
  public long id;
  public String name;
  public float price;
}

List<Product> found = products.findByNameLike(searchPattern, Order.by(
                                              _Product.price.desc(),
                                              _Product.name.asc(),
                                              _Product.id.asc()));

This static metamodel ensures robust query definitions with compile-time safety, updates all references automatically with seamless refactoring, and improves readability with self-documenting code.

Jakarta Data Enhances Queries and Event Handling

Of course, the Jakarta Data project is ongoing, and work is continuing to make it more powerful and useful. There are two active areas of development related to pagination that are of particular note: improvements to queries and enhancements to life cycle events.

3. Dynamic Queries With Restrictions and Enhanced Static Safety

Users obviously need more than just static query definitions. Sometimes dynamic query definitions are necessary, such as when defining customisable search options. As such, we’re working to bring this capability to Jakarta Data through the static metamodel, to make searches more dynamic while still ensuring the safety of the compiled code.

 @Repository
   public interface Products extends CrudRepository<Product, Long> {
       List<Product> search(String namePattern,
                            Restriction<Product> restriction,
                            Order<Product> order);
   }

   List<Product> found = products.search(pattern,
                                         Restrict.all(
                                             _Product.price.greaterThan(25.0f),
                                             _Product.price.lessThan(50.0f),
                                             _Product.name.startsWith(pattern).ignoreCase(),
                                             _Product.image.notNull()),
                                         Order.by(_Product.price.desc(),
                                                  _Product.name.asc()));

4. Improved Life Cycle Event Handling

Event-driven design is becoming more popular and gaining prominence. To ensure Jakarta Data keeps pace, we’re looking into persistence life cycle handling, such as:

void afterUpdate(@Observes PostUpdateEvent<Book> event) {
       // Perform post-update actions
   }

Help Make Jakarta Data Even Better

The Jakarta Data specification has already proven its utility in helping developers streamline database operations.

And, as noted, that work is continuing. From simplifying pagination to introducing type-safe queries, exploring dynamic query capabilities, and beyond, the specification is setting the stage for a robust and developer-friendly future.

Participation in shaping the future of Jakarta Data is open to all. Join the discussion, learn more, and contribute to its growth.

Efficiently Manage Large Datasets With Jakarta Data’s Pagination

Jakarta Data Supports Both Cursor and Offset Pagination

1. Offset Pagination Straightforward But Limited

2. Cursor-Based Pagination More Complex But Consistent

Jakarta Data Enhances Pagination With Type-Safe Queries

Jakarta Data Enhances Queries and Event Handling

3. Dynamic Queries With Restrictions and Enhanced Static Safety

4. Improved Life Cycle Event Handling

Help Make Jakarta Data Even Better

About the Author

Otavio Santana

More from this Edition

Securing Open Source Projects at the Eclipse Foundation: 2024 in Review