Bridge-Coder: Unlocking LLMs' Potential to Overcome Language Gaps in Low-Resource Code
Jipeng Zhang, Jianshu Zhang, Yuanzhe Li, Renjie Pi, Rui Pan, Runtao, Liu, Ziqiang Zheng, Tong Zhang

TL;DR
Bridge-Coder is a novel method that leverages large language models' capabilities to improve code generation in low-resource programming languages, addressing the language gap and promoting equitable technological development.
Contribution
The paper introduces Bridge-Coder, a two-stage approach utilizing LLMs' knowledge to enhance low-resource language performance, overcoming data annotation challenges and quality issues.
Findings
Significant performance improvements on multiple LRPLs.
Effective dataset creation using LLMs' general knowledge.
Enhanced NL-LRPL alignment through bridging techniques.
Abstract
Large Language Models (LLMs) demonstrate strong proficiency in generating code for high-resource programming languages (HRPLs) like Python but struggle significantly with low-resource programming languages (LRPLs) such as Racket or D. This performance gap deepens the digital divide, preventing developers using LRPLs from benefiting equally from LLM advancements and reinforcing disparities in innovation within underrepresented programming communities. While generating additional training data for LRPLs is promising, it faces two key challenges: manual annotation is labor-intensive and costly, and LLM-generated LRPL code is often of subpar quality. The underlying cause of this issue is the gap between natural language to programming language gap (NL-PL Gap), which is especially pronounced in LRPLs due to limited aligned data. In this work, we introduce a novel approach called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security · Library Science and Information Systems
