TL;DR
InferLog is a novel method that significantly accelerates LLM inference for online log parsing by optimizing prefix caching and configuration tuning, enabling faster and more efficient log analysis in high-volume environments.
Contribution
InferLog introduces the first LLM inference optimization approach for online log parsing, focusing on accelerating inference without sacrificing accuracy.
Findings
InferLog achieves significant speedup over existing methods.
It maintains high parsing accuracy while improving inference efficiency.
Experimental results validate its effectiveness on real log datasets.
Abstract
Modern software systems generate massive volumes of runtime logs, necessitating efficient and accurate log parsing to enable critical downstream tasks such as anomaly detection and root cause analysis. Recently, large language models (LLMs) have achieved advanced accuracy on log parsing, but their deployment in production environments faces two major limitations: (1) the privacy risks associated with commercial LLMs, driving the adoption of local deployment, and (2) the stringent latency and throughput requirements imposed by high-volume log streams, which existing LLM-based parsers fail to meet. Although recent efforts have reduced the number of LLM queries, they overlook the high latency of the LLM invocations, where concurrent log parsing requests can cause serve performance degradation of LLM inference system. In this study, we present InferLog, the first LLM inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
