Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Kaixin Ding; Yang Zhou; Xi Chen; Miao Yang; Jiarong Ou; Rui Chen; Xin Tao; Hengshuang Zhao

arXiv:2512.16905·cs.CV·December 19, 2025

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Kaixin Ding, Yang Zhou, Xi Chen, Miao Yang, Jiarong Ou, Rui Chen, Xin Tao, Hengshuang Zhao

PDF

Open Access

TL;DR

Alchemist is a novel meta-gradient framework that automatically selects high-quality data subsets for text-to-image models, significantly improving visual fidelity and training efficiency by focusing on influential samples.

Contribution

It introduces the first scalable, automatic meta-gradient-based data selection method tailored for text-to-image model training, enhancing data efficiency and model performance.

Findings

01

Training on 50% of data selected by Alchemist outperforms using full dataset.

02

Alchemist improves visual quality and downstream task performance.

03

The framework is effective on both synthetic and web-crawled datasets.

Abstract

Recent advances in Text-to-Image (T2I) generative models, such as Imagen, Stable Diffusion, and FLUX, have led to remarkable improvements in visual quality. However, their performance is fundamentally limited by the quality of training data. Web-crawled and synthetic image datasets often contain low-quality or redundant samples, which lead to degraded visual fidelity, unstable training, and inefficient computation. Hence, effective data selection is crucial for improving data efficiency. Existing approaches rely on costly manual curation or heuristic scoring based on single-dimensional features in Text-to-Image data filtering. Although meta-learning based method has been explored in LLM, there is no adaptation for image modalities. To this end, we propose **Alchemist**, a meta-gradient-based framework to select a suitable subset from large-scale text-image data pairs. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques