Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation
Hoigi Seo, Wongi Jeong, Jae-sun Seo, Se Young Chun

TL;DR
Skrr introduces a layer-skipping strategy for text encoders in text-to-image models, significantly reducing memory usage while maintaining high image quality and outperforming existing pruning methods.
Contribution
The paper presents a novel pruning approach called Skrr that selectively skips and reuses transformer layers in text encoders for memory-efficient T2I diffusion models.
Findings
Skrr reduces memory consumption by up to 8x.
Maintains comparable image quality with high sparsity levels.
Outperforms existing blockwise pruning methods in memory efficiency.
Abstract
Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance in generating high-quality images from textual prompts. Unlike denoising modules that rely on multiple iterative steps, text encoders require only a single forward pass to produce text embeddings. However, despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage, up to eight times more than denoising modules. To address this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models. Skrr exploits the inherent redundancy in transformer blocks by selectively skipping or reusing certain layers in a manner tailored for T2I tasks, thereby reducing memory consumption without compromising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVideo Analysis and Summarization
MethodsDiffusion · Pruning · Contrastive Language-Image Pre-training
