TL;DR
Gambit is an open-source, rule-based name disambiguation tool that uses only name and email data, significantly outperforming existing algorithms in version control system analysis.
Contribution
It introduces gambit, a novel disambiguation tool that relies solely on name and email, demonstrating superior accuracy over existing methods.
Findings
Gambit achieves an F1 score of 0.985.
It outperforms two commonly used algorithms.
The tool is open source and effective for version control data.
Abstract
Name disambiguation is a complex but highly relevant challenge whenever analysing real-world user data, such as data from version control systems. We propose gambit, a rule-based disambiguation tool that only relies on name and email information. We evaluate its performance against two commonly used algorithms with similar characteristics on manually disambiguated ground-truth data from the Gnome GTK project. Our results show that gambit significantly outperforms both algorithms, achieving an F1 score of 0.985.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
