Masked Generative Nested Transformers with Decode Time Scaling

Sahil Goyal; Debapriya Tula; Gagan Jain; Pradeep Shenoy; Prateek Jain,; Sujoy Paul

arXiv:2502.00382·cs.CV·February 4, 2025

Masked Generative Nested Transformers with Decode Time Scaling

Sahil Goyal, Debapriya Tula, Gagan Jain, Pradeep Shenoy, Prateek Jain,, Sujoy Paul

PDF

Open Access

TL;DR

This paper introduces a novel nested transformer approach with decode time scaling to improve inference efficiency in visual generation, reducing computational costs while maintaining competitive quality.

Contribution

It proposes a decode time model scaling schedule and computation caching, enabling smaller models to process more tokens, significantly reducing inference costs without increasing model size.

Findings

01

Achieves nearly 3x less compute than baseline methods.

02

Maintains competitive image and video generation quality.

03

Validates effectiveness on ImageNet, UCF101, and Kinetics600 datasets.

Abstract

Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iterations, which makes it computationally expensive. In this work, we aim to address this issue primarily through two key ideas - (a) not all parts of the generation process need equal compute, and we design a decode time model scaling schedule to utilize compute effectively, and (b) we can cache and reuse some of the computation. Combining these two ideas leads to using smaller models to process more tokens while large models process fewer tokens. These different-sized models do not increase the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCellular Automata and Applications