Heterogeneous Metric Learning with Content-based Regularization for   Software Artifact Retrieval

Liang Wu; Hui Xiong; Liang Du; Bo Liu; Guandong Xu; Yong Ge; Yanjie; Fu; Yuanchun Zhou; Jianhui Li

arXiv:1409.7165·cs.LG·November 15, 2016

Heterogeneous Metric Learning with Content-based Regularization for Software Artifact Retrieval

Liang Wu, Hui Xiong, Liang Du, Bo Liu, Guandong Xu, Yong Ge, Yanjie, Fu, Yuanchun Zhou, Jianhui Li

PDF

TL;DR

This paper introduces a heterogeneous metric learning approach that combines code and text features into a unified semantic space, significantly improving software artifact retrieval accuracy.

Contribution

It develops a novel feature extraction method for source codes and a heterogeneous metric learning model to integrate code and text features for better retrieval.

Findings

01

Enhanced retrieval performance on real-world datasets

02

Significant improvement over existing methods

03

Effective integration of code and text features

Abstract

The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which is generated by collecting words from comments, identifiers, and descriptions of programs. However, in addition to these semantic information, there are rich information embedded in source codes themselves. These source codes, if analyzed properly, can be a rich source for enhancing the efforts of software artifact retrieval. To this end, in this paper, we develop a feature extraction method on source codes. Specifically, this method can capture both the inherent information in the source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.