Sampling in Software Engineering Research: A Critical Review and Guidelines
Sebastian Baltes, Paul Ralph

TL;DR
This paper critically reviews sampling practices in software engineering research, highlighting the rarity of representative and random sampling, and offers guidelines to improve sampling quality and understanding.
Contribution
It provides a comprehensive review of sampling issues in software engineering research and proposes guidelines to enhance sampling practices and interpretation.
Findings
Random sampling is rarely used.
Sophisticated sampling strategies are very rare.
Sampling and representativeness are often misunderstood.
Abstract
Representative sampling appears rare in empirical software engineering research. Not all studies need representative samples, but a general lack of representative sampling undermines a scientific field. This article therefore reports a critical review of the state of sampling in recent, high-quality software engineering research. The key findings are: (1) random sampling is rare; (2) sophisticated sampling strategies are very rare; (3) sampling, representativeness and randomness often appear misunderstood. These findings suggest that software engineering research has a generalizability crisis. To address these problems, this paper synthesizes existing knowledge of sampling into a succinct primer and proposes extensive guidelines for improving the conduct, presentation and evaluation of sampling in software engineering research. It is further recommended that while researchers should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Open Source Software Innovations
