Analyzing GitHub Issues and Pull Requests in nf-core Pipelines: Insights into nf-core Pipeline Repositories
Khairul Alam, Banani Roy

TL;DR
This study analyzes over 25,000 issues and pull requests in nf-core pipelines to identify common challenges, management practices, and factors influencing resolution times, providing insights to improve pipeline development and maintenance.
Contribution
It offers the first large-scale empirical analysis of nf-core pipeline issues, identifying key challenges and factors affecting issue resolution, using topic modeling and statistical methods.
Findings
89.38% of issues are eventually closed
Half of issues are resolved within 3 days
Labels and code snippets significantly increase resolution likelihood
Abstract
Scientific Workflow Systems (SWSs) such as Nextflow have become essential software frameworks for conducting reproducible, scalable, and portable computational analyses in data-intensive fields like genomics, transcriptomics, and proteomics. Building on Nextflow, the nf-core community curates standardized, peer-reviewed pipelines that follow strict testing, documentation, and governance guidelines. Despite its widespread adoption, little is known about the challenges users face in developing and maintaining these pipelines. This paper presents an empirical study of 25,173 issues and pull requests from these pipelines to uncover recurring challenges, management practices, and perceived difficulties. Using BERTopic modeling, we identify 13 key challenges, including pipeline development and integration, bug fixing, integrating genomic data, managing CI configurations, and handling version…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Genomics and Phylogenetic Studies
