A Simple, Yet Effective Approach to Finding Biases in Code Generation

Spyridon Mouselinos; Mateusz Malinowski; Henryk Michalewski

arXiv:2211.00609·cs.AI·May 10, 2023

A Simple, Yet Effective Approach to Finding Biases in Code Generation

Spyridon Mouselinos, Mateusz Malinowski, Henryk Michalewski

PDF

Open Access

TL;DR

This paper identifies biases in large language model-based code generation systems, introduces a modular analysis framework called 'block of influence', and proposes mitigation strategies through data transformation during fine-tuning.

Contribution

It presents a novel modular analysis method for biases in code generation models and demonstrates bias mitigation via data transformation techniques.

Findings

01

Biases can significantly impair code quality in large language models.

02

The 'block of influence' framework effectively exposes model biases.

03

Fine-tuning with data transformations reduces biases in generated code.

Abstract

Recently, high-performing code generation systems based on large language models have surfaced. They are trained on massive corpora containing much more natural text than actual executable computer code. This work shows that current code generation systems exhibit undesired biases inherited from their large language model backbones, which can reduce the quality of the generated code under specific circumstances. To investigate the effect, we propose the "block of influence" concept, which enables a modular decomposition and analysis of the coding challenges. We introduce an automated intervention mechanism reminiscent of adversarial testing that exposes undesired biases through the failure modes of the models under test. Finally, we demonstrate how our framework can be used as a data transformation technique during fine-tuning, acting as a mitigation strategy for these biases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Advanced Malware Detection Techniques