LEA: A Learned Encoding Advisor for Column Stores
Lujing Cen, Andreas Kipf, Ryan Marcus, Tim Kraska

TL;DR
LEA is a machine learning-based system that predicts optimal column encodings in data warehouses, improving query speed and reducing storage compared to traditional heuristics.
Contribution
Introduces LEA, a learned approach for selecting column encodings that adapts to data and workload, outperforming heuristic methods.
Findings
LEA reduces query latency by 19% on TPC-H.
LEA decreases storage space by 26%.
LEA effectively balances size and performance optimization.
Abstract
Data warehouses organize data in a columnar format to enable faster scans and better compression. Modern systems offer a variety of column encodings that can reduce storage footprint and improve query performance. Selecting a good encoding scheme for a particular column is an optimization problem that depends on the data, the query workload, and the underlying hardware. We introduce Learned Encoding Advisor (LEA), a learned approach to column encoding selection. LEA is trained on synthetic datasets with various distributions on the target system. Once trained, LEA uses sample data and statistics (such as cardinality) from the user's database to predict the optimal column encodings. LEA can optimize for encoded size, query performance, or a combination of the two. Compared to the heuristic-based encoding advisor of a commercial column store on TPC-H, LEA achieves 19% lower query latency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
