Plug-and-Play Context Feature Reuse for Efficient Masked Generation

Xuejie Liu; Anji Liu; Guy Van den Broeck; Yitao Liang

arXiv:2505.19089·cs.CV·May 27, 2025

Plug-and-Play Context Feature Reuse for Efficient Masked Generation

Xuejie Liu, Anji Liu, Guy Van den Broeck, Yitao Liang

PDF

Open Access

TL;DR

ReCAP is a plug-and-play module that accelerates masked generative models by reusing context features, reducing inference time while maintaining high-quality image synthesis.

Contribution

ReCAP introduces a novel feature reuse technique that speeds up masked generative models without sacrificing generation quality.

Findings

01

ReCAP achieves up to 2.4x faster inference on ImageNet256.

02

ReCAP maintains high fidelity with reduced computation.

03

Effective across multiple MGMs and architectures.

Abstract

Masked generative models (MGMs) have emerged as a powerful framework for image synthesis, combining parallel decoding with strong bidirectional context modeling. However, generating high-quality samples typically requires many iterative decoding steps, resulting in high inference costs. A straightforward way to speed up generation is by decoding more tokens in each step, thereby reducing the total number of steps. However, when many tokens are decoded simultaneously, the model can only estimate the univariate marginal distributions independently, failing to capture the dependency among them. As a result, reducing the number of steps significantly compromises generation fidelity. In this work, we introduce ReCAP (Reused Context-Aware Prediction), a plug-and-play module that accelerates inference in MGMs by constructing low-cost steps via reusing feature embeddings from previously decoded…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Balanced Selection