Leveraging Predictions from Multiple Repositories to Improve Bot Detection
Natarajan Chidambaram, Alexandre Decan, Mehdi Golzadeh

TL;DR
This paper explores how aggregating bot detection predictions across multiple repositories enhances accuracy and coverage, leveraging the 'wisdom of the crowd' to improve existing tools like BoDeGHa.
Contribution
It introduces a method to combine predictions from multiple repositories, improving bot detection effectiveness beyond single-repository analysis.
Findings
Increased number of predictions made across repositories
Many diverging predictions can be corrected by aggregation
Preliminary results show promising improvements
Abstract
Contemporary social coding platforms such as GitHub facilitate collaborative distributed software development. Developers engaged in these platforms often use machine accounts (bots) for automating effort-intensive or repetitive activities. Determining whether a contributor corresponds to a bot or a human account is important in socio-technical studies, for example, to assess the positive and negative impact of using bots, analyse the evolution of bots and their usage, identify top human contributors, and so on. BoDeGHa is one of the bot detection tools that have been proposed in the literature. It relies on comment activity within a single repository to predict whether an account is driven by a bot or by a human. This paper presents preliminary results on how the effectiveness of BoDeGHa can be improved by combining the predictions obtained from many repositories at once. We found that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
