Automatic link extraction: The good, the bad and the ugly in software ecosystem mining
Eleni Constantinou, Tom Mens

TL;DR
This paper discusses the challenges and pitfalls of automatic link extraction in software ecosystem mining, based on manual investigation of RubyGems data, and proposes automation to improve dataset completeness.
Contribution
It identifies common pitfalls in automatic link extraction from software repositories and suggests automation techniques to enhance data quality for ecosystem analysis.
Findings
Manual investigation revealed key pitfalls in link extraction.
Automation can mitigate these pitfalls and improve dataset completeness.
Enhanced datasets support better multi-platform software ecosystem studies.
Abstract
This abstract presents the automatic link extraction pitfalls based on our experience on manually investigating links in the RubyGems package manager metadata. This work can lead in automating the link extraction approach so as to avoid these pitfalls and produce more complete datasets to be used by researchers when they investigate the multi-platform evolution of software ecosystems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Scientific Computing and Data Management · Open Source Software Innovations
