Bad Smells in Software Analytics Papers

Tim Menzies; Martin Shepperd

arXiv:1803.05518·cs.SE·April 16, 2019

Bad Smells in Software Analytics Papers

Tim Menzies, Martin Shepperd

PDF

Open Access

TL;DR

This paper identifies 12 common issues, termed 'bad smells', in software analytics research papers, aiming to improve study reliability and guide both producers and consumers of such studies.

Contribution

It introduces the 'bad smells' metaphor to diagnose and discuss potential problems in software analytics research papers, providing a basis for improving study quality.

Findings

01

List of 12 'bad smells' with examples

02

Impact of 'bad smells' on study validity demonstrated

03

Encourages ongoing debate on research validity

Abstract

CONTEXT: There has been a rapid growth in the use of data analytics to underpin evidence-based software engineering. However the combination of complex techniques, diverse reporting standards and poorly understood underlying phenomena are causing some concern as to the reliability of studies. OBJECTIVE: Our goal is to provide guidance for producers and consumers of software analytics studies (computational experiments and correlation studies). METHOD: We propose using "bad smells", i.e., surface indications of deeper problems and popular in the agile software community and consider how they may be manifest in software analytics studies. RESULTS: We list 12 "bad smells" in software analytics papers (and show their impact by examples). CONCLUSIONS: We believe the metaphor of bad smell is a useful device. Therefore we encourage more debate on what contributes to the validty of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software System Performance and Reliability · Scientific Computing and Data Management