Galaxy Zoo Evo: 1 million human-annotated images of galaxies
Mike Walmsley, Steven Bamford, Hugh Dickinson, Tobias G\'eron, Alexander J. Gordon, Annette M.N. Ferguson, Lucy Fortson, Sandor Kruk, Natalie Lines, Chris J. Lintott, Karen L. Masters, Robert G. Mann, James Pearson, Hayley Roberts, Anna M.M. Scaife, Stefan Schuldt

TL;DR
Galaxy Zoo Evo provides a vast, detailed dataset of 823,000 galaxy images with 104 million labels, enabling advanced foundation models for astronomical image analysis and domain adaptation.
Contribution
This paper introduces Galaxy Zoo Evo, a large-scale, richly labeled galaxy image dataset designed for training and evaluating foundation models in astronomy.
Findings
Provides 104 million labels for 823,000 images
Includes specialized labels for tasks like lens detection
Serves as a benchmark for domain adaptation in astronomy
Abstract
We introduce Galaxy Zoo Evo, a labeled dataset for building and evaluating foundation models on images of galaxies. GZ Evo includes 104M crowdsourced labels for 823k images from four telescopes. Each image is labeled with a series of fine-grained questions and answers (e.g. "featured galaxy, two spiral arms, tightly wound, merging with another galaxy"). These detailed labels are useful for pretraining or finetuning. We also include four smaller sets of labels (167k galaxies in total) for downstream tasks of specific interest to astronomers, including finding strong lenses and describing galaxies from the new space telescope Euclid. We hope GZ Evo will serve as a real-world benchmark for computer vision topics such as domain adaption (from terrestrial to astronomical, or between telescopes) or learning under uncertainty from crowdsourced labels. We also hope it will support a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
