LST-Bench: Benchmarking Log-Structured Tables in the Cloud
Jes\'us Camacho-Rodr\'iguez, Ashvin Agrawal, Anja Gruenheid, Ashit, Gosalia, Cristian Petculescu, Josep Aguilar-Saborit, Avrilia Floratou, Carlo, Curino, Raghu Ramakrishnan

TL;DR
LST-Bench introduces a new benchmarking framework to evaluate Log-Structured Tables like Delta Lake, Iceberg, and Hudi, focusing on their performance and features in cloud data processing environments.
Contribution
The paper presents LST-Bench, a novel benchmarking framework tailored for Log-Structured Tables, enabling systematic evaluation of their design choices and performance in various workloads.
Findings
Effective assessment of LSTs and engines
Insights into design and optimization impacts
Open-source benchmarking toolkit
Abstract
Data processing engines increasingly leverage distributed file systems for scalable, cost-effective storage. While the Apache Parquet columnar format has become a popular choice for data storage and retrieval, the immutability of Parquet files renders it impractical to meet the demands of frequent updates in contemporary analytical workloads. Log-Structured Tables (LSTs), such as Delta Lake, Apache Iceberg, and Apache Hudi, offer an alternative for scenarios requiring data mutability, providing a balance between efficient updates and the benefits of columnar storage. They provide features like transactions, time-travel, and schema evolution, enhancing usability and enabling access from multiple engines. Moreover, engines like Apache Spark and Trino can be configured to leverage the optimizations and controls offered by LSTs to meet specific business needs. Conventional benchmarks and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Advanced Database Systems and Queries
