Nucleus-Image: Sparse MoE for Image Generation

Chandan Akiti; Ajay Modukuri; Murali Nandan Nagarapu; Gunavardhan Akiti; Haozhe Liu

arXiv:2604.12163·cs.CV·April 15, 2026

Nucleus-Image: Sparse MoE for Image Generation

Chandan Akiti, Ajay Modukuri, Murali Nandan Nagarapu, Gunavardhan Akiti, Haozhe Liu

PDF

4 Models

TL;DR

Nucleus-Image introduces a sparse MoE diffusion transformer for text-to-image generation that achieves high quality with significantly fewer active parameters, optimizing efficiency and scalability.

Contribution

It presents a novel sparse MoE architecture with Expert-Choice Routing, optimized training strategies, and a large-scale dataset, advancing high-quality, efficient image generation.

Findings

01

Matches or exceeds leading models on multiple benchmarks.

02

Activates only approximately 2B parameters per forward pass.

03

Achieves high-quality image generation without post-training optimization.

Abstract

We present Nucleus-Image, a text-to-image generation model that establishes a new Pareto frontier in quality-versus-efficiency by matching or exceeding leading models on GenEval, DPG-Bench, and OneIG-Bench while activating only approximately 2B parameters per forward pass. Nucleus-Image employs a sparse mixture-of-experts (MoE) diffusion transformer architecture with Expert-Choice Routing that scales total model capacity to 17B parameters across 64 routed experts per layer. We adopt a streamlined architecture optimized for inference efficiency by excluding text tokens from the transformer backbone entirely and using joint attention that enables text KV sharing across timesteps. To improve routing stability when using timestep modulation, we introduce a decoupled routing design that separates timestep-aware expert assignment from timestep-conditioned expert computation. We construct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.