Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient   Text-to-Image Generation

Hoigi Seo; Wongi Jeong; Jae-sun Seo; Se Young Chun

arXiv:2502.08690·cs.LG·February 14, 2025

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Hoigi Seo, Wongi Jeong, Jae-sun Seo, Se Young Chun

PDF

Open Access 1 Video

TL;DR

Skrr introduces a layer-skipping strategy for text encoders in text-to-image models, significantly reducing memory usage while maintaining high image quality and outperforming existing pruning methods.

Contribution

The paper presents a novel pruning approach called Skrr that selectively skips and reuses transformer layers in text encoders for memory-efficient T2I diffusion models.

Findings

01

Skrr reduces memory consumption by up to 8x.

02

Maintains comparable image quality with high sparsity levels.

03

Outperforms existing blockwise pruning methods in memory efficiency.

Abstract

Large-scale text encoders in text-to-image (T2I) diffusion models have demonstrated exceptional performance in generating high-quality images from textual prompts. Unlike denoising modules that rely on multiple iterative steps, text encoders require only a single forward pass to produce text embeddings. However, despite their minimal contribution to total inference time and floating-point operations (FLOPs), text encoders demand significantly higher memory usage, up to eight times more than denoising modules. To address this inefficiency, we propose Skip and Re-use layers (Skrr), a simple yet effective pruning strategy specifically designed for text encoders in T2I diffusion models. Skrr exploits the inherent redundancy in transformer blocks by selectively skipping or reusing certain layers in a manner tailored for T2I tasks, thereby reducing memory consumption without compromising…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation· slideslive

Taxonomy

TopicsVideo Analysis and Summarization

MethodsDiffusion · Pruning · Contrastive Language-Image Pre-training