PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou, Xiaoyu Zhang, Yongchuan Tang

TL;DR
PanoLlama introduces an autoregressive framework for endless, coherent panoramic image generation, overcoming size limitations of previous models and enabling diverse applications with state-of-the-art coherence, fidelity, and aesthetics.
Contribution
It presents a novel training-free token redirection strategy for panoramic image generation with autoregressive models, allowing unlimited size and enhanced coherence.
Findings
Achieves state-of-the-art coherence (47.50%) in panoramic images.
Enables mask-free layout control and multi-guidance synthesis.
Supports endless panorama generation with improved fidelity and aesthetics.
Abstract
Panoramic Image Generation (PIG) aims to create coherent images of arbitrary lengths. Most existing methods fall in the joint diffusion paradigm, but their complex and heuristic crop connection designs often limit their ability to achieve multilevel coherence. By deconstructing this challenge into its core components, we find it naturally aligns with next-token prediction, leading us to adopt an autoregressive (AR) paradigm for PIG modeling. However, existing visual AR (VAR) models are limited to fixed-size generation, lacking the capability to produce panoramic images. In this paper, we propose PanoLlama, a novel framework that achieves endless and coherent panorama generation with the autoregressive paradigm. Our approach develops a training-free strategy that utilizes token redirection to overcome the size limitations of existing VAR models, enabling next-crop prediction in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Computer Graphics and Visualization Techniques · Mathematics, Computing, and Information Processing
MethodsDiffusion
