Be aware of overfitting by hyperparameter optimization!

Igor V. Tetko; Ruud van Deursen; Guillaume Godin

arXiv:2407.20786·cs.LG·November 26, 2024·2 cites

Be aware of overfitting by hyperparameter optimization!

Igor V. Tetko, Ruud van Deursen, Guillaume Godin

PDF

Open Access

TL;DR

This paper investigates overfitting risks in hyperparameter optimization for solubility prediction, demonstrating that pre-set hyperparameters can perform comparably to optimized ones and introducing a Transformer CNN method that outperforms graph-based models in efficiency and accuracy.

Contribution

The study reveals that hyperparameter optimization may lead to overfitting and shows that a Transformer CNN can outperform graph-based methods in solubility prediction tasks.

Findings

01

Hyperparameter optimization does not always improve model performance.

02

Pre-set hyperparameters can reduce computational effort significantly.

03

Transformer CNN outperforms graph-based methods in most comparisons.

Abstract

Hyperparameter optimization is very frequently employed in machine learning. However, an optimization of a large space of parameters could result in overfitting of models. In recent studies on solubility prediction the authors collected seven thermodynamic and kinetic solubility datasets from different data sources. They used state-of-the-art graph-based methods and compared models developed for each dataset using different data cleaning protocols and hyperparameter optimization. In our study we showed that hyperparameter optimization did not always result in better models, possibly due to overfitting when using the same statistical measures. Similar results could be calculated using pre-set hyperparameters, reducing the computational effort by around 10,000 times. We also extended the previous analysis by adding a representation learning method based on Natural Language Processing of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

MethodsAttention Is All You Need · Label Smoothing · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization · Softmax · Position-Wise Feed-Forward Layer · Dense Connections · Multi-Head Attention