LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data
Xiaoyun Li, Hongyu Zhang, Van-Hoang Le, Pengfei Chen

TL;DR
LogShrink is a novel log compression method that leverages the inherent commonality and variability in log data to significantly reduce storage requirements while maintaining efficiency.
Contribution
This paper introduces LogShrink, a new log compression technique utilizing log data characteristics, including an analyzer based on LCS and entropy, and a clustering-based sampler for improved compression.
Findings
LogShrink outperforms baselines with 16% to 356% better compression ratio.
It effectively captures log data commonality and variability for compression.
The method maintains reasonable compression speed.
Abstract
Log data is a crucial resource for recording system events and states during system execution. However, as systems grow in scale, log data generation has become increasingly explosive, leading to an expensive overhead on log storage, such as several petabytes per day in production. To address this issue, log compression has become a crucial task in reducing disk storage while allowing for further log analysis. Unfortunately, existing general-purpose and log-specific compression methods have been limited in their ability to utilize log data characteristics. To overcome these limitations, we conduct an empirical study and obtain three major observations on the characteristics of log data that can facilitate the log compression task. Based on these observations, we propose LogShrink, a novel and effective log compression method by leveraging commonality and variability of log data. An…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Scientific Computing and Data Management
