Be aware of overfitting by hyperparameter optimization!
Igor V. Tetko, Ruud van Deursen, Guillaume Godin

TL;DR
This paper investigates overfitting risks in hyperparameter optimization for solubility prediction, demonstrating that pre-set hyperparameters can perform comparably to optimized ones and introducing a Transformer CNN method that outperforms graph-based models in efficiency and accuracy.
Contribution
The study reveals that hyperparameter optimization may lead to overfitting and shows that a Transformer CNN can outperform graph-based methods in solubility prediction tasks.
Findings
Hyperparameter optimization does not always improve model performance.
Pre-set hyperparameters can reduce computational effort significantly.
Transformer CNN outperforms graph-based methods in most comparisons.
Abstract
Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsAttention Is All You Need · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention
