InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem
Tanner Wright, Adams Chen, Gema Rodr\'iguez-P\'erez

TL;DR
This paper introduces InEx-Bug, a manually annotated dataset of 377 GitHub issues from NPM projects, distinguishing intrinsic internal bugs from extrinsic dependency or environment issues, with detailed metadata and analysis.
Contribution
The paper presents a novel, richly annotated dataset that categorizes bugs as intrinsic or extrinsic in the NPM ecosystem, enabling deeper defect analysis.
Findings
Intrinsic bugs resolve faster than extrinsic bugs.
Intrinsic bugs are more likely to be closed with code changes.
Extrinsic bugs have higher reopen rates and longer recurrence times.
Abstract
Understanding the causes of software defects is essential for reliable software maintenance and ecosystem stability. However, existing bug datasets do not distinguish between issues originating within a project from those caused by external dependencies or environmental factors. In this paper we present InEx-Bug, a manually annotated dataset of 377 GitHub issues from 103 NPM repositories, categorizing issues as Intrinsic (internal defect), Extrinsic (dependency/environment issue), Not-a-Bug, or Unknown. Beyond labels, the dataset includes rich temporal and behavioral metadata such as maintainer participation, code changes, and reopening patterns. Analyses show Intrinsic bugs resolve faster (median 8.9 vs 10.2 days), are close more often (92% vs 78%), and require code changes more frequently (57% vs 28%) compared to Extrinsic bugs. While Extrinsic bugs exhibit higher reopen rates (12% vs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
