The Energy-Throughput Trade-off in Lossless-Compressed Source Code Storage
Paolo Ferragina, Francesco Tosoni

TL;DR
This paper explores the trade-offs between space, time, and energy efficiency in lossless-compressed source code storage, demonstrating how different configurations impact performance and energy use in large-scale data retrieval.
Contribution
It introduces a compressed key-value store optimized for source code datasets, analyzing its resource trade-offs and providing guidelines for energy-aware system design.
Findings
High compression ratios improve retrieval throughput.
Scaling energy efficiency is challenging due to hardware non-energy-proportionality.
Different compression configurations offer distinct Pareto-optimal trade-offs.
Abstract
Retrieving data from large-scale source code archives is vital for AI training, neural-based software analysis, and information retrieval, to cite a few. This paper studies and experiments with the design of a compressed key-value store for the indexing of large-scale source code datasets, evaluating its trade-off among three primary computational resources: (compressed) space occupancy, time, and energy efficiency. Extensive experiments on a national high-performance computing infrastructure demonstrate that different compression configurations yield distinct trade-offs, with high compression ratios and order-of-magnitude gains in retrieval throughput and energy efficiency. We also study data parallelism and show that, while it significantly improves speed, scaling energy efficiency is more difficult, reflecting the known non-energy-proportionality of modern hardware and challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
