FAME: Failure-Aware Mixture-of-Experts for Message-Level Log Anomaly Detection
Huanchi Wang, Zihang Huang, Yifang Tian, Kristina Dzeparoska, Hans-Arno Jacobsen, Alberto Leon-Garcia

TL;DR
FAME is a novel, label-efficient framework that enables message-level log anomaly detection by combining large language models with a mixture-of-experts approach, improving accuracy and reducing annotation effort.
Contribution
The paper introduces FAME, a failure-aware mixture-of-experts model that leverages LLMs offline to enable fine-grained, message-level anomaly detection with minimal labeling.
Findings
FAME achieves high F1 scores of 98.16 and 99.95 on BGL and Thunderbird datasets.
FAME reduces annotation effort by 76 times compared to traditional methods.
Detects 86.3% of anomalies from unseen EventIDs.
Abstract
Production systems generate millions of log lines daily, yet most anomaly detectors operate at the session or window-level, flagging groups of lines rather than identifying the specific message responsible. This coarse granularity forces operators to inspect many routine lines per alert. Message-level detection offers finer granularity, but remains challenging. A single event template may correspond to both normal and anomalous messages, failures arise from heterogeneous subsystems, and line-level labeling at scale is impractical. Although large language models (LLMs) can reason over log semantics, applying them to every line is too costly for continuous monitoring. We present FAME (Failure-Aware Mixture-of-Experts), a label-efficient message-level mixture-of-experts framework that uses an LLM only once offline. We annotate at most K labeled lines per template to derive binary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
