Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs

Renyi Zhong; Yichen Li; Yulun Wu; Jinxi Kuang; Yintong Huo; Michael R. Lyu

arXiv:2604.17529·cs.SE·April 21, 2026

Single-Language Evidence Is Insufficient for Automated Logging: A Multilingual Benchmark and Empirical Study with LLMs

Renyi Zhong, Yichen Li, Yulun Wu, Jinxi Kuang, Yintong Huo, Michael R. Lyu

PDF

TL;DR

This study introduces MultiLogBench, a multilingual benchmark and empirical analysis of automated logging across six programming languages, revealing significant cross-language differences and emphasizing the need for multilingual evaluation.

Contribution

The paper presents a comprehensive multilingual benchmark and empirical study for automated logging, highlighting language-specific challenges and the importance of maintenance-oriented validation.

Findings

01

Framework-anchor matching is highly language-sensitive.

02

Loop and nested-call sites are the most challenging contexts.

03

Top-tier models maintain stable rankings across languages.

Abstract

Logging statements are central to debugging, failure diagnosis, and production observability, yet writing them requires developers to decide where to place a logging statement, which API and severity level to use, and what runtime information to expose. Automated logging aims to reduce this burden, but existing evidence remains dominated by Java-centric repository-snapshot dataset. It is therefore unclear whether conclusions about model behavior and model selection generalize across programming-language ecosystems or realistic code evolution. This paper presents MultiLogBench, a multilingual benchmark and empirical study spanning six programming language ecosystems. MultiLogBench contains 63,965 production-code repository-snapshot instances, 744 revision-history cases where developers introduce logging statements during maintenance, and a paired transformed revision-history branch for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.