FireRed-Image-Edit-1.0 Technical Report
Super Intelligence Team: Changhao Qiao, Chao Hui, Chen Li, Cunzheng Wang, Dejia Song, Jiale Zhang, Jing Li, Qiang Xiang, Runqi Wang, Shuang Sun, Wei Zhu, Xu Tang, Yao Hu, Yibo Chen, Yuhao Huang, Yuxuan Duan, Zhiyi Chen, Ziyuan Guo

TL;DR
FireRed-Image-Edit introduces a diffusion transformer for instruction-based image editing, achieving state-of-the-art results through extensive data curation, multi-stage training, and innovative optimization techniques, supported by a new comprehensive benchmark.
Contribution
The paper presents a novel diffusion transformer model with advanced training strategies and a large, high-quality dataset for instruction-based image editing, along with a new benchmark suite.
Findings
Achieves state-of-the-art performance on REDEdit-Bench and public benchmarks.
Introduces new techniques like Multi-Condition Aware Bucket Sampler and Asymmetric Gradient Optimization.
Demonstrates strong semantic coverage and instruction alignment in image editing tasks.
Abstract
We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗FireRedTeam/FireRed-Image-Edit-1.1model· 8.0k dl· ♡ 1978.0k dl♡ 197
- 🤗rmgn888/FireRed-Image-Edit-1.1model· 10 dl· ♡ 110 dl♡ 1
- 🤗drbaph/FireRed-Image-Edit-1.1_ComfyUI_Quantsmodel· 980 dl· ♡ 10980 dl♡ 10
- 🤗Adeennn/FireRed-Image-Edit-1.1model· 28 dl· ♡ 228 dl♡ 2
- 🤗Alanyanz/FireRed-Image-Edit-1.1model· 10 dl10 dl
- 🤗number7even/FireRed-Image-Edit-1.1model· 10 dl10 dl
- 🤗MuhammedIjas555444/FireRed-Image-Edit-1.1model· 15 dl15 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques · Multimodal Machine Learning Applications
