Understanding Open Source Contributor Profiles in Popular Machine Learning Libraries
Jiawen Liu, Haoxiang Zhang, Ying Zou

TL;DR
This study analyzes open-source contributor profiles in popular machine learning libraries by examining their activities, work preferences, and engagement patterns to better understand their roles and impact on project success.
Contribution
It introduces a novel profiling approach based on repository activity data, identifying four distinct contributor profiles and analyzing their influence on OSS project dynamics.
Findings
Four contributor profiles identified: Core-Afterhour, Core-Workhour, Peripheral-Afterhour, Peripheral-Workhour.
Significant features include project experience, authored files, collaborations, and location.
Long-term contributors tend to make fewer, more balanced, and less technical contributions.
Abstract
With the increasing popularity of machine learning (ML), many open-source software (OSS) contributors are attracted to developing and adopting ML approaches. Comprehensive understanding of ML contributors is crucial for successful ML OSS development and maintenance. Without such knowledge, there is a risk of inefficient resource allocation and hindered collaboration in ML OSS projects. Existing research focuses on understanding the difficulties and challenges perceived by ML contributors by user surveys. There is a lack of understanding of ML contributors based on their activities tracked from software repositories. In this paper, we aim to understand ML contributors by identifying contributor profiles in ML libraries. We further study contributors' OSS engagement from three aspects: workload composition, work preferences, and technical importance. By investigating 7,640 contributors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices
