Exploring Software Reusability Metrics with Q&A Forum Data

Matthew T. Patrick

arXiv:2005.08845·cs.SE·May 19, 2020

Exploring Software Reusability Metrics with Q&A Forum Data

Matthew T. Patrick

PDF

Open Access

TL;DR

This paper presents LANLAN, a machine learning approach using word embeddings to analyze StackOverflow Q&A data, distinguishing problem reports from support requests to improve understanding of software reusability.

Contribution

Introduces LANLAN, a novel method leveraging word embeddings and machine learning to analyze unstructured Q&A forum data for insights into software reuse difficulties.

Findings

01

Achieved AUROC over 0.9 in identifying problem reports and support requests.

02

Demonstrated Q&A data can inform software reusability metrics.

03

LANLAN predicts future user difficulties effectively.

Abstract

Question and answer (Q&A) forums contain valuable information regarding software reuse, but they can be challenging to analyse due to their unstructured free text. Here we introduce a new approach (LANLAN), using word embeddings and machine learning, to harness information available in StackOverflow. Specifically, we consider two different kinds of user communication describing difficulties encountered in software reuse: 'problem reports' point to potential defects, while 'support requests' ask for clarification on software usage. Word embeddings were trained on 1.6 billion tokens from StackOverflow and applied to identify which Q&A forum messages (from two large open source projects: Eclipse and Bioconductor) correspond to problem reports or support requests. LANLAN achieved an area under the receiver operator curve (AUROC) of over 0.9; it can be used to explore the relationship…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Scientific Computing and Data Management · Open Source Software Innovations