Curriculum Learning for Small Code Language Models
Marwa Na\"ir, Kamel Yamani, Lynda Said Lhadj, Riyadh Baghdadi

TL;DR
This paper demonstrates that curriculum learning can significantly improve the performance of small code language models, especially in code execution tasks, challenging prior assumptions about its effectiveness.
Contribution
It introduces a novel curriculum learning schedule and a code difficulty assessment metric tailored for small code language models, showing notable performance improvements.
Findings
Curriculum learning improves code execution accuracy in small models.
Effect on code completion tasks is less significant.
Proposes a new code difficulty metric for curriculum design.
Abstract
Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these models. While prior research has suggested that curriculum learning does not necessarily help in improving the performance of language models, our results surprisingly show that this may not be the case for code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models on the task of code execution, while its effect on code completion is less significant. To explore the potential of curriculum learning, we train multiple GPT models with 1 million parameters each to predict the next token and evaluate them on code completion and execution tasks. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Weight Decay
