English Please: Evaluating Machine Translation with Large Language Models for Multilingual Bug Reports
Avinash Patil, Siru Tao, Aryan Jadon

TL;DR
This paper evaluates various machine translation systems, including large language models, on translating and identifying source languages in multilingual bug reports, revealing diverse strengths and the importance of task-specific evaluation.
Contribution
It provides the first comprehensive comparison of MT models on bug report translation and source language identification, highlighting their strengths and limitations.
Findings
ChatGPT excels in semantic and lexical translation quality.
Claude and Mistral achieve highest F1-scores in language identification.
AWS Translate has the highest accuracy in source language detection.
Abstract
Accurate translation of bug reports is critical for efficient collaboration in global software development. In this study, we conduct the first comprehensive evaluation of machine translation (MT) performance on bug reports, analyzing the capabilities of DeepL, AWS Translate, and large language models such as ChatGPT, Claude, Gemini, LLaMA, and Mistral using data from the Visual Studio Code GitHub repository, specifically focusing on reports labeled with the english-please tag. To assess both translation quality and source language identification accuracy, we employ a range of MT evaluation metrics-including BLEU, BERTScore, COMET, METEOR, and ROUGE-alongside classification metrics such as accuracy, precision, recall, and F1-score. Our findings reveal that while ChatGPT (gpt-4o) excels in semantic and lexical translation quality, it does not lead in source language identification.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Scientific Computing and Data Management · Web Data Mining and Analysis
MethodsLLaMA
