A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection

Yuqing Wang; Ying Song; Xiaozhou Li; Nana Reinikainen; Mika V. M\"antyl\"a

arXiv:2604.08028·cs.SE·April 10, 2026

A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection

Yuqing Wang, Ying Song, Xiaozhou Li, Nana Reinikainen, Mika V. M\"antyl\"a

PDF

TL;DR

This paper benchmarks various semantic log representation methods for anomaly detection, introduces QTyBERT to balance effectiveness and efficiency, and demonstrates its competitive performance on multiple datasets.

Contribution

It provides a comprehensive benchmark of existing methods and proposes QTyBERT, a novel semantic log representation that balances detection accuracy and computational efficiency.

Findings

01

BERT-based methods are more effective but slower for log embedding generation.

02

Static word embeddings are efficient but less effective for anomaly detection.

03

QTyBERT achieves a good balance between effectiveness and efficiency, comparable to BERT.

Abstract

Recent deep learning (DL) methods for log anomaly detection increasingly rely on semantic log representation methods that convert the textual content of log events into vector embeddings as input to DL models. However, these DL methods are typically evaluated as end-to-end pipelines, while the impact of different semantic representation methods is not well understood. In this paper, we benchmark widely used semantic log representation methods, including static word embedding methods (Word2Vec, GloVe, and FastText) and the BERT-based contextual embedding method, across diverse DL models for log-event level anomaly detection on three publicly available log datasets: BGL, Thunderbird, and Spirit. We identify an effectiveness--efficiency trade off under CPU deployment settings: the BERT-based method is more effective, but incurs substantially longer log embedding generation time, limiting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.