Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?
Amritanshu Agrawal, Xueqi Yang, Rishabh Agrawal, Xipeng Shen, and Tim, Menzies

TL;DR
This paper investigates when simple hyperparameter optimization methods like DODGE are sufficient for software analytics, showing they work well for low-dimensional data but not for higher-dimensional cases.
Contribution
It provides empirical evidence on the effectiveness of DODGE for low-dimensional software engineering datasets, guiding when to use simpler versus complex optimizers.
Findings
DODGE performs best on datasets with low intrinsic dimensionality.
DODGE is ineffective for datasets with high intrinsic dimensionality.
Most SE datasets examined are low-dimensional, favoring DODGE use.
Abstract
How to make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve for improving the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by just DODGE-ing away from things tried before. But when is it wise to use DODGE and when must we use more complex (and much slower) optimizers? To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github ssue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that DODGE works best for data sets with low "intrinsic dimensionality" (D = 3) and very poorly for higher-dimensional data (D over 8). Nearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Software Reliability and Analysis Research
