Self-Supervised Query Reformulation for Code Search
Yuetian Mao, Chengcheng Wan, Yuze Jiang, Xiaodong Gu

TL;DR
This paper introduces SSQR, a self-supervised method for query reformulation in code search that leverages pre-trained models and does not require parallel query data, achieving strong results.
Contribution
The paper proposes SSQR, a novel self-supervised approach using a masked language modeling task on unannotated queries to improve code search query reformulation.
Findings
SSQR outperforms unsupervised baselines significantly.
SSQR achieves competitive performance with supervised methods.
The method effectively leverages pre-trained models without needing parallel data.
Abstract
Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code search engines. This restricts its practicality in software development processes. In this paper, we propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus. Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task conducted on an extensive unannotated corpus of queries. SSQR extends T5 (a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Software Engineering Research · Natural Language Processing Techniques
