Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection
Kaixin Ding, Yang Zhou, Xi Chen, Miao Yang, Jiarong Ou, Rui Chen, Xin Tao, Hengshuang Zhao

TL;DR
Alchemist is a novel meta-gradient framework that automatically selects high-quality data subsets for text-to-image models, significantly improving visual fidelity and training efficiency by focusing on influential samples.
Contribution
It introduces the first scalable, automatic meta-gradient-based data selection method tailored for text-to-image model training, enhancing data efficiency and model performance.
Findings
Training on 50% of data selected by Alchemist outperforms using full dataset.
Alchemist improves visual quality and downstream task performance.
The framework is effective on both synthetic and web-crawled datasets.
Abstract
Recent advances in Text-to-Image (T2I) generative models, such as Imagen, Stable Diffusion, and FLUX, have led to remarkable improvements in visual quality. However, their performance is fundamentally limited by the quality of training data. Web-crawled and synthetic image datasets often contain low-quality or redundant samples, which lead to degraded visual fidelity, unstable training, and inefficient computation. Hence, effective data selection is crucial for improving data efficiency. Existing approaches rely on costly manual curation or heuristic scoring based on single-dimensional features in Text-to-Image data filtering. Although meta-learning based method has been explored in LLM, there is no adaptation for image modalities. To this end, we propose **Alchemist**, a meta-gradient-based framework to select a suitable subset from large-scale text-image data pairs. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Computer Graphics and Visualization Techniques
