A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

Yufeng Luo; Adam D. Myers; Alex Drlica-Wagner; Dario Dematties; Salma Borchani; Francisco Valdes; Arjun Dey; David Schlegel; Rongpu Zhou; and DESI Legacy Imaging Surveys Team

arXiv:2507.12784·astro-ph.IM·March 2, 2026

A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

Yufeng Luo, Adam D. Myers, Alex Drlica-Wagner, Dario Dematties, Salma Borchani, Francisco Valdes, Arjun Dey, David Schlegel, Rongpu Zhou, and DESI Legacy Imaging Surveys Team

PDF

TL;DR

This paper presents a semi-supervised machine learning pipeline combining vision transformers and kNN to efficiently identify poor-quality astronomical images in large surveys, reducing reliance on manual inspection.

Contribution

The authors develop a novel semi-supervised approach using vision transformers and clustering analysis for scalable image quality assessment in large astronomical surveys.

Findings

01

Successfully identified 780 problematic exposures in DECaLS DR11

02

Pipeline achieves high accuracy in classifying image quality

03

Method reduces manual effort in large-scale survey data quality control

Abstract

As the data volume of astronomical imaging surveys rapidly increases, traditional methods for image anomaly detection, such as visual inspection by human experts, are becoming impractical. We introduce a machine-learning-based approach to detect poor-quality exposures in large imaging surveys, with a focus on the DECam Legacy Survey (DECaLS) in regions of low extinction (i.e., $E (B - V) < 0.04$ ). Our semi-supervised pipeline integrates a vision transformer (ViT), trained via self-supervised learning (SSL), with a k-Nearest Neighbor (kNN) classifier. We train and validate our pipeline using a small set of labeled exposures observed by surveys with the Dark Energy Camera (DECam). A clustering-space analysis of where our pipeline places images labeled in ``good'' and ``bad'' categories suggests that our approach can efficiently and accurately determine the quality of exposures. Applied to new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.