Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning
Jiayi Lin, Yanlin Wang, Yibiao Yang, Lei Zhang, Yutao Xie

TL;DR
This paper introduces CL4D, a contrastive learning framework that enhances decoder-only models' ability to understand code, enabling them to perform better on tasks like code search and clone detection without extensive retraining.
Contribution
The paper presents a novel contrastive learning approach to adapt decoder-only models for code understanding, demonstrating competitive performance on benchmark tasks.
Findings
CL4D improves semantic alignment of code representations.
Decoder-only models can be effectively adapted for understanding tasks.
Enhanced models outperform existing methods on code search and clone detection.
Abstract
Recent advances in large-scale code generation models have led to remarkable progress in producing high-quality code. These models are trained in a self-supervised manner on extensive unlabeled code corpora using a decoder-only architecture. However, despite their generative strength, decoder-only models often exhibit limited performance on code understanding tasks such as code search and clone detection, primarily due to their generation-oriented training objectives. While training large encoder-only models from scratch on massive code datasets can improve understanding ability but remains computationally expensive and time-consuming. In this paper, we explore a more efficient alternative by transferring knowledge from pre-trained decoder-only code generation models to code understanding tasks. We investigate how decoder-only architectures can be effectively adapted to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Natural Language Processing Techniques
MethodsContrastive Learning
