Self-Supervised Query Reformulation for Code Search

Yuetian Mao; Chengcheng Wan; Yuze Jiang; Xiaodong Gu

arXiv:2307.00267·cs.SE·July 4, 2023·1 cites

Self-Supervised Query Reformulation for Code Search

Yuetian Mao, Chengcheng Wan, Yuze Jiang, Xiaodong Gu

PDF

Open Access 1 Repo

TL;DR

This paper introduces SSQR, a self-supervised method for query reformulation in code search that leverages pre-trained models and does not require parallel query data, achieving strong results.

Contribution

The paper proposes SSQR, a novel self-supervised approach using a masked language modeling task on unannotated queries to improve code search query reformulation.

Findings

01

SSQR outperforms unsupervised baselines significantly.

02

SSQR achieves competitive performance with supervised methods.

03

The method effectively leverages pre-trained models without needing parallel data.

Abstract

Automatic query reformulation is a widely utilized technology for enriching user requirements and enhancing the outcomes of code search. It can be conceptualized as a machine translation task, wherein the objective is to rephrase a given query into a more comprehensive alternative. While showing promising results, training such a model typically requires a large parallel corpus of query pairs (i.e., the original query and a reformulated query) that are confidential and unpublished by online code search engines. This restricts its practicality in software development processes. In this paper, we propose SSQR, a self-supervised query reformulation method that does not rely on any parallel query corpus. Inspired by pre-trained models, SSQR treats query reformulation as a masked language modeling task conducted on an extensive unannotated corpus of queries. SSQR extends T5 (a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

redsmallpanda/ssqr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Software Engineering Research · Natural Language Processing Techniques