Effective Reformulation of Query for Code Search using Crowdsourced   Knowledge and Extra-Large Data Analytics

Mohammad Masudur Rahman; Chanchal K. Roy

arXiv:1807.08798·cs.SE·July 25, 2018

Effective Reformulation of Query for Code Search using Crowdsourced Knowledge and Extra-Large Data Analytics

Mohammad Masudur Rahman, Chanchal K. Roy

PDF

TL;DR

This paper introduces a novel method to reformulate natural language queries for code search by leveraging Stack Overflow data and semantic analysis, significantly improving search relevance and accuracy.

Contribution

The paper presents a new technique that automatically identifies relevant API classes from Stack Overflow and uses semantic proximity to enhance code search queries, outperforming existing methods.

Findings

01

Reformulated queries achieved 48% precision and 58% recall in API class identification.

02

The technique outperformed state-of-the-art methods by 32% in precision and 48% in recall.

03

Significantly improved code search results across popular search engines.

Abstract

Software developers frequently issue generic natural language queries for code search while using code search engines (e.g., GitHub native search, Krugle). Such queries often do not lead to any relevant results due to vocabulary mismatch problems. In this paper, we propose a novel technique that automatically identifies relevant and specific API classes from Stack Overflow Q & A site for a programming task written as a natural language query, and then reformulates the query for improved code search. We first collect candidate API classes from Stack Overflow using pseudo-relevance feedback and two term weighting algorithms, and then rank the candidates using Borda count and semantic proximity between query keywords and the API classes. The semantic proximity has been determined by an analysis of 1.3 million questions and answers of Stack Overflow. Experiments using 310 code search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.