DeLog: An Efficient Log Compression Framework with Pattern Signature Synthesis
Siyu Yu, Yifan Wu, Junjielong Xu, Ying Fu, Ning Wang, Maoyin Liu, Pancheng Jiang, Xiang Zhang, Tong Jia, Pinjia He, Ying Li

TL;DR
DeLog introduces a novel log compression framework that leverages pattern signature synthesis to improve compression ratios and speed, demonstrating that effective pattern grouping is more crucial than parsing accuracy.
Contribution
The paper presents DeLog, a new log compressor that uses pattern signature synthesis to enhance compression performance, challenging the assumption that higher parsing accuracy always yields better compression.
Findings
DeLog achieves state-of-the-art compression ratios on multiple datasets.
Pattern-based grouping is more influential on compression than parsing accuracy.
DeLog demonstrates superior speed compared to existing methods.
Abstract
Parser-based log compression, which separates static templates from dynamic variables, is a promising approach to exploit the unique structure of log data. However, its performance on complex production logs is often unsatisfactory. This performance gap coincides with a known degradation in the accuracy of its core log parsing component on such data, motivating our investigation into a foundational yet unverified question: does higher parsing accuracy necessarily lead to better compression ratio? To answer this, we conduct the first empirical study quantifying this relationship and find that a higher parsing accuracy does not guarantee a better compression ratio. Instead, our findings reveal that compression ratio is dictated by achieving effective pattern-based grouping and encoding, i.e., the partitioning of tokens into low entropy, highly compressible groups. Guided by this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Cloud Computing and Resource Management · Time Series Analysis and Forecasting
