NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis
Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, Jianfeng Wang, Lijuan, Wang, Zicheng Liu, Yuejian Fang, Nan Duan

TL;DR
NUWA-Infinity introduces an autoregressive over autoregressive model for infinite visual synthesis, enabling high-resolution, arbitrarily-sized images and long videos with efficient dependency modeling and flexible generation order.
Contribution
It proposes a novel hierarchical autoregressive generation mechanism with a Nearby Context Pool and Arbitrary Direction Controller for scalable, high-quality visual synthesis.
Findings
Supports high-resolution, arbitrary-sized image generation
Enables long-duration video synthesis
Outperforms previous models in resolution and flexibility
Abstract
In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos. An autoregressive over autoregressive generation mechanism is proposed to deal with this variable-size generation task, where a global patch-level autoregressive model considers the dependencies between patches, and a local token-level autoregressive model considers dependencies between visual tokens within each patch. A Nearby Context Pool (NCP) is introduced to cache-related patches already generated as the context for the current patch being generated, which can significantly save computation costs without sacrificing patch-level dependency modeling. An Arbitrary Direction Controller (ADC) is used to decide suitable generation orders for different visual synthesis tasks and learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
