The effects of image augmentations when training machine learning models in astronomy

Leon H. Butterworth; Ashley Spindler

arXiv:2604.24862·astro-ph.IM·April 29, 2026

The effects of image augmentations when training machine learning models in astronomy

Leon H. Butterworth, Ashley Spindler

PDF

TL;DR

This study evaluates how image augmentations affect deep neural network performance in galaxy morphology classification, highlighting diminishing returns with larger datasets and suggesting simpler augmentations may suffice.

Contribution

It provides empirical evidence on the effectiveness of image augmentations in astronomy, emphasizing their limited benefit as dataset size increases and recommending practical augmentation strategies.

Findings

01

Augmentations generally improve model performance.

02

The benefit of augmentations diminishes with larger datasets.

03

Complex augmentations increase training time without clear performance gains.

Abstract

We measure the influence of image augmentations and training dataset size when training a deep neural network to classify galaxy morphology. Data augmentation is an integral step when training machine learning models and often astronomers add augmentations assuming they will always improve the performance of their models. We train multiple versions of the same pre-existing Zoobot model using different image augmentations and different dataset sizes from 230,000 galaxy images from Galaxy Zoo DECaLS to determine whether this assumption is necessarily true. We find that generally, the addition of image augmentations does improve a deep neural network's performance, however, this improvement is significantly diminished as the training dataset size increases. The choice of specific augmentations (provided they are sensible) does not seem to be as important as simply having augmentations as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.