LeCo: Lightweight Compression via Learning Serial Correlations
Yihao Liu, Xinyu Zeng, Huanchen Zhang

TL;DR
LeCo is a machine learning-based framework that automatically learns to remove serial correlations in data columns, achieving superior compression ratios and faster decompression, outperforming existing methods in real-world applications.
Contribution
LeCo introduces a general machine learning framework that unifies and improves upon existing compression algorithms by exploiting serial correlations in data columns.
Findings
Achieves Pareto improvement in compression ratio and access speed.
Up to 5.2x faster query performance in data analytics.
16% throughput increase in RocksDB.
Abstract
Lightweight data compression is a key technique that allows column stores to exhibit superior performance for analytical queries. Despite a comprehensive study on dictionary-based encodings to approach Shannon's entropy, few prior works have systematically exploited the serial correlation in a column for compression. In this paper, we propose LeCo (i.e., Learned Compression), a framework that uses machine learning to remove the serial redundancy in a value sequence automatically to achieve an outstanding compression ratio and decompression performance simultaneously. LeCo presents a general approach to this end, making existing (ad-hoc) algorithms such as Frame-of-Reference (FOR), Delta Encoding, and Run-Length Encoding (RLE) special cases under our framework. Our microbenchmark with three synthetic and six real-world data sets shows that a prototype of LeCo achieves a Pareto…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · Advanced Database Systems and Queries
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
