Automatically Recommend Code Updates: Are We There Yet?
Yue Liu, Chakkrit Tantithamthavorn, Yonghui Liu, Patanamon, Thongtanunam, Li Li

TL;DR
This paper evaluates the effectiveness of large pre-trained Code Language Models in recommending code updates, revealing significant gaps between their perceived and actual performance in real-world scenarios, especially over time and with complex changes.
Contribution
It provides the first extensive evaluation of CodeLMs for code update recommendation, highlighting their limitations in realistic, temporal, and complex update settings.
Findings
CodeLMs perform well without considering temporal factors
They struggle with real-world, time-wise scenarios and generalize poorly
Performance drops for larger methods and complex updates
Abstract
In recent years, large pre-trained Language Models of Code (CodeLMs) have shown promising results on various software engineering tasks. One such task is automatic code update recommendation, which transforms outdated code snippets into their approved and revised counterparts. Although many CodeLM-based approaches have been proposed, claiming high accuracy, their effectiveness and reliability on real-world code update tasks remain questionable. In this paper, we present the first extensive evaluation of state-of-the-art CodeLMs for automatically recommending code updates. We assess their performance on two diverse datasets of paired updated methods, considering factors such as temporal evolution, project specificity, method size, and update complexity. Our results reveal that while CodeLMs perform well in settings that ignore temporal information, they struggle in more realistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile and Web Applications · Green IT and Sustainability · Software System Performance and Reliability
