Representation of Developer Expertise in Open Source Software
Tapajit Dey, Andrey Karnauch, Audris Mockus

TL;DR
This paper introduces the Skill Space, a vector-based representation of developers, APIs, and projects in open source software, aiming to better reflect expertise at an ecosystem level and improve developer matching.
Contribution
It proposes a novel Skill Space model using Doc2Vec embeddings to represent and analyze developer expertise across the OSS ecosystem.
Findings
Embeddings reflect the intended topology of the Skill Space.
Representations can predict API usage and project participation.
Aligns with self-reported developer expertise.
Abstract
Background: Accurate representation of developer expertise has always been an important research problem. While a number of studies proposed novel methods of representing expertise within individual projects, these methods are difficult to apply at an ecosystem level. However, with the focus of software development shifting from monolithic to modular, a method of representing developers' expertise in the context of the entire OSS development becomes necessary when, for example, a project tries to find new maintainers and look for developers with relevant skills. Aim: We aim to address this knowledge gap by proposing and constructing the Skill Space where each API, developer, and project is represented and postulate how the topology of this space should reflect what developers know (and projects need). Method: we use the World of Code infrastructure to extract the complete set of APIs in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
