Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?
Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Rahul Yedida, Xipeng, Shen, Tim Menzies

TL;DR
This paper investigates when simple hyperparameter optimization methods like DODGE are effective for software analytics, showing they work well on low-dimensional data but not on high-dimensional data.
Contribution
It provides empirical evidence on the effectiveness of DODGE in hyperparameter tuning for various software engineering datasets based on their intrinsic dimensionality.
Findings
DODGE performs best on low-dimensional datasets (u ~ 3).
DODGE is ineffective on high-dimensional datasets (u > 8).
Most SE datasets are low-dimensional, making DODGE broadly applicable.
Abstract
How can we make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by "DODGE-ing"; i.e. simply steering way from settings that lead to similar conclusions. But when is it wise to use that simple approach and when must we use more complex (and much slower) optimizers?} To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github issue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that the simple DODGE works best for data sets with low "intrinsic dimensionality" (u ~ 3) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
