Fine-Tuning Multilingual Language Models for Code Review: An Empirical Study on Industrial C# Projects
Igli Begolli, Meltem Aksoy, Daniel Neider

TL;DR
This paper empirically evaluates the effectiveness of monolingual fine-tuning of multilingual language models for automated code review tasks in industrial C# projects, highlighting the impact of language alignment and task-specific training.
Contribution
It demonstrates that monolingual fine-tuning enhances model performance for code review tasks and provides insights into the effects of language configurations in training data.
Findings
Monolingual fine-tuning improves accuracy over multilingual models.
Models support routine review tasks but are less effective for complex changes.
Language alignment in training data is crucial for optimal performance.
Abstract
Code review is essential for maintaining software quality but often time-consuming and cognitively demanding, especially in industrial environments. Recent advancements in language models (LMs) have opened new avenues for automating core review tasks. This study presents the empirical evaluation of monolingual fine-tuning on the performance of open-source LMs across three key automated code review tasks: Code Change Quality Estimation, Review Comment Generation, and Code Refinement. We fine-tuned three distinct models, CodeReviewer, CodeLlama-7B, and DeepSeek-R1-Distill, on a C\# specific dataset combining public benchmarks with industrial repositories. Our study investigates how different configurations of programming languages and natural languages in the training data affect LM performance, particularly in comment generation. Additionally, we benchmark the fine-tuned models against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Topic Modeling
