A Survey on Vision Autoregressive Model

Kai Jiang; Jiaxing Huang

arXiv:2411.08666·cs.CV·November 19, 2024

A Survey on Vision Autoregressive Model

Kai Jiang, Jiaxing Huang

PDF

Open Access

TL;DR

This survey comprehensively reviews the development, applications, and challenges of vision autoregressive models, highlighting their recent advancements in various vision tasks and multimodal generation.

Contribution

It provides a systematic taxonomy, analyzes recent progress, benchmarks methods, and discusses future directions for vision autoregressive models.

Findings

01

Autoregressive models excel in diverse vision tasks including image and video generation.

02

Recent advancements enable unified multimodal visual understanding and generation.

03

Benchmarking reveals strengths and limitations of current methods.

Abstract

Autoregressive models have demonstrated great performance in natural language processing (NLP) with impressive scalability, adaptability and generalizability. Inspired by their notable success in NLP field, autoregressive models have been intensively investigated recently for computer vision, which perform next-token predictions by representing visual data as visual tokens and enables autoregressive modelling for a wide range of vision tasks, ranging from visual generation and visual understanding to the very recent multimodal generation that unifies visual generation and understanding with a single autoregressive model. This paper provides a systematic review of vision autoregressive models, including the development of a taxonomy of existing methods and highlighting their major contributions, strengths, and limitations, covering various vision tasks such as image generation, video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Remote Sensing and Land Use · Infrared Target Detection Methodologies