Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks
Kyi Shin Khant, Hong Yi Lin, Patanamon Thongtanunam

TL;DR
This study evaluates the effectiveness of curriculum learning using code complexity metrics on pre-trained code models, revealing limited benefits and signs of catastrophic forgetting in software engineering tasks.
Contribution
It provides an empirical assessment of curriculum learning with conventional difficulty measures on CodeT5 for SE tasks, highlighting challenges and limitations.
Findings
Model performance saturates early, indicating limited learning capacity.
Contrasting results with prior studies suggest challenges in applying CL to code models.
Signs of catastrophic forgetting and shortcut learning observed during training.
Abstract
Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training approaches may not fully optimize model performance, as they typically involve learning from randomly shuffled training data. Recent work shows that Curriculum Learning (CL) can improve performance on code-related tasks through incremental learning based on the difficulty of synthetic code. Yet, the effectiveness of CL with conventional difficulty measures in SE tasks remains largely unexplored. In this study, we explore two conventional code metrics: code length and cyclomatic complexity to determine the difficulty levels. We investigate how the pre-trained code model (CodeT5) learns under CL, through the tasks of code clone detection and code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Model-Driven Software Engineering Techniques · Software Engineering Research
