GBM Returns the Best Prediction Performance among Regression Approaches: A Case Study of Stack Overflow Code Quality
Sherlock A. Licorish, Brendon Woodford, Lakmal Kiyaduwa Vithanage, Osayande Pascal Omondiagbe

TL;DR
This study evaluates various regression techniques to predict Stack Overflow Java code quality, finding that Gradient Boosting Machine (GBM) offers the best predictive performance among tested approaches.
Contribution
It introduces a regression-based approach to predict Stack Overflow code quality and demonstrates that GBM outperforms other regression methods in this context.
Findings
GBM achieved the highest prediction accuracy among six regression approaches.
Longer code snippets tend to have more violations.
Questions with higher scores attract more views and answers, correlating with more code errors.
Abstract
Practitioners are increasingly dependent on publicly available resources for supporting their knowledge needs during software development. This has thus caused a spotlight to be paced on these resources, where researchers have reported mixed outcomes around the quality of these resources. Stack Overflow, in particular, has been studied extensively, with evidence showing that code resources on this platform can be of poor quality at times. Limited research has explored the variables or factors that predict code quality on Stack Overflow, but instead has focused on ranking content, identifying defects and predicting future content. In many instances approaches used for prediction are not evaluated to identify the best techniques. Contextualizing the Stack Overflow code quality problem as regression-based, we examined the variables that predict Stack Overflow (Java) code quality, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Data Stream Mining Techniques
