# Identifying collaborators in large codebases

**Authors:** Waren Long, Vadim Markovtsev, Hugo Mougard, Egor Bulychev, Jan Hula

arXiv: 1905.06782 · 2019-05-17

## TL;DR

This paper presents a method to analyze developer collaboration patterns in large codebases by embedding and clustering commit activity, programming languages, and code topics, revealing hidden interactions and organizational insights.

## Contribution

It introduces a novel approach combining embedding and clustering techniques to uncover organic collaboration structures in large-scale software projects.

## Key findings

- Successfully reconstructed the engineering organization structure
- Revealed hidden coding collaborations
- Justified technical decisions within the organization

## Abstract

The way developers collaborate inside and particularly across teams often escapes management's attention, despite a formal organization with designated teams being defined. Observability of the actual, organically formed engineering structure provides decision makers invaluable additional tools to manage their talent pool. To identify existing inter and intra-team interactions - and suggest relevant opportunities for suitable collaborations - this paper studies contributors' commit activity, usage of programming languages, and code identifier topics by embedding and clustering them. We evaluate our findings collaborating with the GitLab organization, analyzing 117 of their open source projects. We show that we are able to restore their engineering organization in broad strokes, and also reveal hidden coding collaborations as well as justify in-house technical decisions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.06782/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1905.06782/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/1905.06782/full.md

---
Source: https://tomesphere.com/paper/1905.06782