Bridge and Hint: Extending Pre-trained Language Models for Long-Range Code
Yujia Chen, Cuiyun Gao, Zezhou Yang, Hongyu Zhang, Qing Liao

TL;DR
This paper introduces EXPO, a framework that enhances pre-trained language models for long-range code understanding by using novel memory mechanisms, leading to improved performance on code intelligence tasks.
Contribution
The paper proposes EXPO, incorporating Bridge and Hint Memory mechanisms, to extend PLMs for better long-range code modeling, a novel approach in this domain.
Findings
EXPO significantly improves model performance on code tasks.
The dual-memory approach enhances global and local code understanding.
Experimental results show notable gains across multiple models and tasks.
Abstract
In the field of code intelligence, effectively modeling long-range code poses a significant challenge. Existing pre-trained language models (PLMs) such as UniXcoder have achieved remarkable success, but they still face difficulties with long code inputs. This is mainly due to their limited capacity to maintain contextual continuity and memorize the key information over long-range code. To alleviate the difficulties, we propose EXPO, a framework for EXtending Pre-trained language models for lOng-range code. EXPO incorporates two innovative memory mechanisms we propose in this paper: Bridge Memory and Hint Memory. Bridge Memory uses a tagging mechanism to connect disparate snippets of long-range code, helping the model maintain contextual coherence. Hint Memory focuses on crucial code elements throughout the global context, such as package imports, by integrating a kNN attention layer to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
