Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

Jiayong Qin; Xianyu Zhu; Qiyu Liu; Guangyi Zhang; Zhigang Cai; Jianwei Liao; Sha Hu; Jingshu Peng; Yingxia Shao; Lei Chen

arXiv:2506.20139·cs.DB·June 26, 2025

Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

Jiayong Qin, Xianyu Zhu, Qiyu Liu, Guangyi Zhang, Zhigang Cai, Jianwei Liao, Sha Hu, Jingshu Peng, Yingxia Shao, Lei Chen

PDF

Open Access

TL;DR

This paper provides a theoretical and empirical analysis of error-bounded Piecewise Linear Approximation in learned index structures, establishing new bounds and benchmarking algorithms to guide future design.

Contribution

It introduces a new lower bound on segment coverage for $psilon$-PLA algorithms and benchmarks state-of-the-art methods to analyze trade-offs in learned indexes.

Findings

01

Established a lower bound of psilon^2 on segment coverage

02

Benchmark results reveal trade-offs among accuracy, size, and query performance

03

Guidelines for designing future learned data structures

Abstract

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ( $ϵ$ -PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ϵ$ -PLA fitting algorithms remain underexplored. In this paper, we revisit $ϵ$ -PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω (κ \cdot ϵ^{2})$ on the expected segment coverage for existing $ϵ$ -PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ϵ$ -PLA algorithms when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsFocus