Investigating Notable Metadata Practices in PyPI Libraries: An Empirical Study about Repository and Donation Platform URLs
Alexandros Tsakpinis, Nicolas Raube, Alexander Pretschner

TL;DR
This empirical study investigates the quality and practices of metadata in PyPI libraries, focusing on repository and donation platform links, and evaluates a large language model-based topic modeling approach for analyzing survey data.
Contribution
It provides insights into why repository and donation links are often missing or outdated in PyPI libraries and validates a robust LLM-based topic modeling method for qualitative analysis.
Findings
Missing links often due to oversight or lack of awareness.
Platform dominance driven by ideological and organizational factors.
LLM-based topic modeling showed high robustness and quality.
Abstract
Background: Open source software (OSS) libraries are critical components of modern software systems, yet their metadata-particularly links to source code repositories and donation platforms-is often incomplete, outdated, or inconsistent. Such deficiencies hinder dependency monitoring, security assessment, and the sustainability of OSS projects. Aims: This study aims to explain notable metadata practices in PyPI libraries, focusing on platform dominance, outdated links, and missing references to repositories and donation platforms. As this investigation relies on large-scale qualitative survey data, we further evaluate the robustness and quality of the LLM-based topic modeling approach used to derive the findings. Method: We conducted two surveys targeting PyPI authors and maintainers, collecting 1,776 open-ended responses. To analyze these responses, we developed a LLM-based topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
