I Can't Share Code, but I need Translation -- An Empirical Study on Code   Translation through Federated LLM

Jahnavi Kumar; Venkata Lakshmana Sasaank Janapati; Mokshith Reddy; Tanguturi; Sridhar Chimalakonda

arXiv:2501.05724·cs.SE·January 13, 2025

I Can't Share Code, but I need Translation -- An Empirical Study on Code Translation through Federated LLM

Jahnavi Kumar, Venkata Lakshmana Sasaank Janapati, Mokshith Reddy, Tanguturi, Sridhar Chimalakonda

PDF

Open Access

TL;DR

This paper presents an empirical study on federated learning for code translation using large language models, demonstrating improved translation accuracy while preserving data privacy.

Contribution

It introduces a novel federated LLM approach for code translation, enabling collaborative training without sharing sensitive code data.

Findings

01

Over 40% improvement in CodeBLEU score with FedLLM

02

Effective translation between C# and Java

03

Federated approach preserves data privacy while enhancing performance

Abstract

Owing to the rapid evolution of technologies and project requirements, organizations need to upgrade the code base in their software projects to a new version of the programming language or even translating to an entirely new one. However, code translation is resource-intensive and requires expertise in both the source and target languages. While researchers have made progress in automating translations between legacy and modern languages, recent work has increasingly turned to pre-trained Large Language Models (LLMs) to translate efficiently. Given the proprietary nature of code, organizations prefer fine-tuning LLMs locally rather than relying on external APIs. This is one of the first empirical studies that proposes a Federated LLM-based approach for code translation. The proposed approach enables clients to jointly train a code translator without sharing sensitive data. This study…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security