Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Jian Yang; Wei Zhang; Jiaxi Yang; Yibo Miao; Shanghaoran Quan; Zhenhe; Wu; Qiyao Peng; Liqun Yang; Tianyu Liu; Zeyu Cui; Binyuan Hui; Junyang Lin

arXiv:2502.07487·cs.CL·February 12, 2025

Multi-Agent Collaboration for Multilingual Code Instruction Tuning

Jian Yang, Wei Zhang, Jiaxi Yang, Yibo Miao, Shanghaoran Quan, Zhenhe, Wu, Qiyao Peng, Liqun Yang, Tianyu Liu, Zeyu Cui, Binyuan Hui, Junyang Lin

PDF

Open Access 1 Video

TL;DR

This paper introduces a multi-agent collaboration framework that enhances multilingual code instruction tuning by enabling knowledge transfer among different programming languages, significantly improving code LLM performance across languages.

Contribution

It proposes a novel multi-agent system with shared memory for cross-lingual instruction data generation and collaboration, advancing multilingual code understanding and generation.

Findings

01

Qwen2.5-xCoder outperforms existing models on multilingual benchmarks.

02

The framework effectively reduces the cross-lingual gap in code tasks.

03

Knowledge transfer among languages improves overall model performance.

Abstract

Recent advancement in code understanding and generation demonstrates that code LLMs fine-tuned on a high-quality instruction dataset can gain powerful capabilities to address wide-ranging code-related tasks. However, most previous existing methods mainly view each programming language in isolation and ignore the knowledge transfer among different programming languages. To bridge the gap among different programming languages, we introduce a novel multi-agent collaboration framework to enhance multilingual instruction tuning for code LLMs, where multiple language-specific intelligent agent components with generation memory work together to transfer knowledge from one language to another efficiently and effectively. Specifically, we first generate the language-specific instruction data from the code snippets and then provide the generated data as the seed data for language-specific agents.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Multi-Agent Collaboration for Multilingual Code Instruction Tuning· underline

Taxonomy

TopicsInnovative Teaching and Learning Methods · Multi-Agent Systems and Negotiation · Speech and dialogue systems