Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing
Siyu Yu, Yifan Wu, Ying Li, Pinjia He

TL;DR
Denum is a novel log compression method that focuses on effectively compressing numeric tokens, resulting in significantly better compression ratios and faster speeds compared to existing methods.
Contribution
The paper introduces Denum, a new log compressor that leverages numeric token parsing to improve compression efficiency and speed, addressing limitations of previous parser-based approaches.
Findings
Denum achieves 8.7%-434.7% higher compression ratios.
Denum is 2.6x-37.7x faster than baseline compressors.
Integrating Denum's numeric token parsing improves existing compressors by 11.8% in ratio and 37% in speed.
Abstract
Parser-based log compressors have been widely explored in recent years because the explosive growth of log volumes makes the compression performance of general-purpose compressors unsatisfactory. These parser-based compressors preprocess logs by grouping the logs based on the parsing result and then feed the preprocessed files into a general-purpose compressor. However, parser-based compressors have their limitations. First, the goals of parsing and compression are misaligned, so the inherent characteristics of logs were not fully utilized. In addition, the performance of parser-based compressors depends on the sample logs and thus it is very unstable. Moreover, parser-based compressors often incur a long processing time. To address these limitations, we propose Denum, a simple, general log compressor with high compression ratio and speed. The core insight is that a majority of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Numerical Methods and Algorithms · Mathematics, Computing, and Information Processing
