Joint Autoregressive and Graph Models for Software and Developer Social Networks
Rima Hazra, Hardik Aggarwal, Pawan Goyal, Animesh Mukherjee, and Soumen Chakrabarti

TL;DR
This paper introduces novel methods combining autoregressive models and graph features to predict bug-prone packages and developer-package assignments in a large software dependency network, supported by a comprehensive Ubuntu dataset.
Contribution
It formalizes two new problems in software social networks and proposes a novel integrated modeling approach that improves prediction accuracy over simple autoregression.
Findings
Network-derived features enhance prediction performance.
The dataset includes over 25,000 packages and 280,000 bug reports.
Proposed models outperform baseline autoregressive methods.
Abstract
Social network research has focused on hyperlink graphs, bibliographic citations, friend/follow patterns, influence spread, etc. Large software repositories also form a highly valuable networked artifact, usually in the form of a collection of packages, their developers, dependencies among them, and bug reports. This "social network of code" is rarely studied by social network researchers. We introduce two new problems in this setting. These problems are well-motivated in the software engineering community but not closely studied by social network scientists. The first is to identify packages that are most likely to be troubled by bugs in the immediate future, thereby demanding the greatest attention. The second is to recommend developers to packages for the next development cycle. Simple autoregression can be applied to historical data for both problems, but we propose a novel method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Complex Network Analysis Techniques
