UniCoR: Modality Collaboration for Robust Cross-Language Hybrid Code Retrieval
Yang Yang, Li Kuang, Jiakun Liu, Zhongxin Liu, Yingjie Xia, David Lo

TL;DR
UniCoR introduces a self-supervised framework that enhances hybrid code retrieval by learning unified, semantically robust, and language-agnostic representations, significantly improving cross-language and hybrid query performance.
Contribution
The paper proposes UniCoR, a novel self-supervised learning framework that improves semantic understanding, modality fusion, and cross-language generalization in hybrid code retrieval.
Findings
Achieves 8.64% higher MRR and 11.54% higher MAP over baselines.
Demonstrates stability in hybrid retrieval tasks.
Shows strong generalization in cross-language scenarios.
Abstract
Effective code retrieval is indispensable and it has become an important paradigm to search code in hybrid mode using both natural language and code snippets. Nevertheless, it remains unclear whether existing approaches can effectively leverage such hybrid queries, particularly in cross-language contexts. We conduct a comprehensive empirical study of representative code models and reveal three challenges: (1)insufficient semantic understanding; (2) inefficient fusion in hybrid code retrieval; and (3) weak generalization in cross-language scenarios. To address these challenges, we propose UniCoR, a novel self-supervised framework designed to learn Unified Code Representations that are semantically robust, modally collaborative, and language-agnostic. Firstly, we design a multi-perspective supervised contrastive learning module to enhance semantic understanding and modality fusion. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Web Data Mining and Analysis
