From Effectiveness to Efficiency: Uncovering Linguistic Bias in Large Language Model-based Code Generation
Weipeng Jiang, Xuanqi Gao, Juan Zhai, Shiqing Ma, Xiaoyu Zhang, Ziyan, Lei, and Chao Shen

TL;DR
This study investigates linguistic bias in large language models for code generation, revealing that code quality and efficiency vary significantly based on the language of task descriptions, with a comprehensive evaluation framework and empirical analysis.
Contribution
The paper introduces a unified evaluation framework and conducts the first empirical study on linguistic bias in LLM-based code generation across English and Chinese descriptions.
Findings
LLMs show 12% correctness variation across bilingual tasks
39% of generated code exhibits diverse efficiency
Linguistic bias is a common phenomenon in LLM code generation
Abstract
Large Language Models (LLMs) have demonstrated promising capabilities for code generation. While existing benchmarks evaluate the correctness and efficiency of LLM-generated code, the potential linguistic bias - where code quality varies based on the natural language used to describe programming tasks - remains underexplored. In this paper, we aim to investigate this linguistic bias through the lens of English and Chinese. To facilitate our investigation, we present a unified evaluation framework comprising a curated dataset of 52 Python programming questions with parallel bilingual task descriptions, automated correctness verification, and efficiency quantification tools based on runtime complexity estimation. Based on this framework, we conduct the first empirical study towards the linguistic bias in LLM-generated code on eight popular LCGMs, as well as GPT-3.5-Turbo and GPT-4. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification
