Leveraging Predictions from Multiple Repositories to Improve Bot   Detection

Natarajan Chidambaram; Alexandre Decan; Mehdi Golzadeh

arXiv:2203.16987·cs.SE·April 1, 2022

Leveraging Predictions from Multiple Repositories to Improve Bot Detection

Natarajan Chidambaram, Alexandre Decan, Mehdi Golzadeh

PDF

TL;DR

This paper explores how aggregating bot detection predictions across multiple repositories enhances accuracy and coverage, leveraging the 'wisdom of the crowd' to improve existing tools like BoDeGHa.

Contribution

It introduces a method to combine predictions from multiple repositories, improving bot detection effectiveness beyond single-repository analysis.

Findings

01

Increased number of predictions made across repositories

02

Many diverging predictions can be corrected by aggregation

03

Preliminary results show promising improvements

Abstract

Contemporary social coding platforms such as GitHub facilitate collaborative distributed software development. Developers engaged in these platforms often use machine accounts (bots) for automating effort-intensive or repetitive activities. Determining whether a contributor corresponds to a bot or a human account is important in socio-technical studies, for example, to assess the positive and negative impact of using bots, analyse the evolution of bots and their usage, identify top human contributors, and so on. BoDeGHa is one of the bot detection tools that have been proposed in the literature. It relies on comment activity within a single repository to predict whether an account is driven by a bot or by a human. This paper presents preliminary results on how the effectiveness of BoDeGHa can be improved by combining the predictions obtained from many repositories at once. We found that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.