Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning

Jiayi Lin; Yanlin Wang; Yibiao Yang; Lei Zhang; Yutao Xie

arXiv:2406.12326·cs.SE·February 12, 2026

Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning

Jiayi Lin, Yanlin Wang, Yibiao Yang, Lei Zhang, Yutao Xie

PDF

Open Access 1 Video

TL;DR

This paper introduces CL4D, a contrastive learning framework that enhances decoder-only models' ability to understand code, enabling them to perform better on tasks like code search and clone detection without extensive retraining.

Contribution

The paper presents a novel contrastive learning approach to adapt decoder-only models for code understanding, demonstrating competitive performance on benchmark tasks.

Findings

01

CL4D improves semantic alignment of code representations.

02

Decoder-only models can be effectively adapted for understanding tasks.

03

Enhanced models outperform existing methods on code search and clone detection.

Abstract

Recent advances in large-scale code generation models have led to remarkable progress in producing high-quality code. These models are trained in a self-supervised manner on extensive unlabeled code corpora using a decoder-only architecture. However, despite their generative strength, decoder-only models often exhibit limited performance on code understanding tasks such as code search and clone detection, primarily due to their generation-oriented training objectives. While training large encoder-only models from scratch on massive code datasets can improve understanding ability but remains computationally expensive and time-consuming. In this paper, we explore a more efficient alternative by transferring knowledge from pre-trained decoder-only code generation models to code understanding tasks. We investigate how decoder-only architectures can be effectively adapted to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Better Code Understanding in Decoder-Only Models with Contrastive Learning· underline

Taxonomy

TopicsModel-Driven Software Engineering Techniques · Software Engineering Research · Natural Language Processing Techniques

MethodsContrastive Learning