When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair
Wenqiang Luo, Jacky Wai Keung, Boyang Yang, He Ye, Claire Le Goues,, Tegawende F. Bissyande, Haoye Tian, Bach Le

TL;DR
This paper explores using federated learning to fine-tune large language models for automated program repair while preserving data privacy across private code repositories.
Contribution
It demonstrates that federated fine-tuning improves program repair capabilities and shows robustness to data heterogeneity in real-world industry scenarios.
Findings
Federated fine-tuning enhances LLM-based program repair.
Heterogeneous code data has negligible impact on fine-tuning effectiveness.
Different federated algorithms have unique strengths depending on the LLM used.
Abstract
Software systems have been evolving rapidly and inevitably introducing bugs at an increasing rate, leading to significant losses in resources consumed by software maintenance. Recently, large language models (LLMs) have demonstrated remarkable potential in enhancing software development and maintenance practices, particularly in automated program repair (APR) with improved accuracy and efficiency of bug fixing. However, LLM-based APR heavily relies on high-quality code repositories. A larger portion of existing code repositories are for private use and proprietary assets from various industries, reflecting more diversity and nuances in the data since real-world industries often have more extensive software development practices, which cannot be covered by merely public datasets. Therefore, utilizing private datasets shows significant potential in enhancing software development and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Privacy-Preserving Technologies in Data
