Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Matteo Vaccargiu; Sabrina Aufiero; Silvia Bartolucci; Ronnie de Souza Santos; Roberto Tonelli; Giuseppe Destefanis

arXiv:2603.24501·cs.SE·May 22, 2026

Efficiency for Experts, Visibility for Newcomers: A Case Study of Label-Code Alignment in Kubernetes

Matteo Vaccargiu, Sabrina Aufiero, Silvia Bartolucci, Ronnie de Souza Santos, Roberto Tonelli, Giuseppe Destefanis

PDF

TL;DR

This case study of Kubernetes investigates how label-code alignment impacts collaboration, revealing its prevalence, stability, and influence on review dynamics across contributor experience levels.

Contribution

It introduces label-diff congruence as a measure of label-code alignment and analyzes its effects on collaboration and review behavior in open-source projects.

Findings

01

46.6% of pull requests show perfect label-code alignment

02

Higher congruence leads to quieter reviews among core developers

03

Among newcomers, higher congruence correlates with increased engagement

Abstract

Labels on platforms such as GitHub support triage and coordination, yet little is known about how well they align with code modifications or how such alignment affects collaboration across contributor experience levels. We present a case study of the Kubernetes project, introducing label-diff congruence - the alignment between pull request labels and modified files - and examining its prevalence, stability, behavioral validation, and relationship to collaboration outcomes across contributor tiers. We analyse 18,020 pull requests (2014--2025) with area labels and complete file diffs, validate alignment through analysis of over one million review comments and label corrections, and test associations with time-to-merge and discussion characteristics using quantile regression and negative binomial models stratified by contributor experience. Congruence is prevalent (46.6\% perfect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.