Active Code Learning: Benchmarking Sample-Efficient Training of Code Models
Qiang Hu, Yuejun Guo, Xiaofei Xie, Maxime Cordy, Lei Ma, Mike, Papadakis, and Yves Le Traon

TL;DR
This paper introduces the first benchmark for active learning in code models, evaluating various acquisition functions and revealing key factors affecting sample efficiency and performance in code-related tasks.
Contribution
It builds a comprehensive benchmark for active code learning, adapting acquisition functions for code tasks and analyzing their effectiveness and influencing factors.
Findings
Feature selection significantly impacts active learning performance.
Output vector-based data selection outperforms other methods.
Active learning shows limited effectiveness in code summarization tasks.
Abstract
The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions~(which are used for data selection in active…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Software Engineering Research · Software Testing and Debugging Techniques
