LANTERN: Accelerating Visual Autoregressive Models with Relaxed   Speculative Decoding

Doohyuk Jang; Sihwan Park; June Yong Yang; Yeonsung Jung; Jihun Yun,; Souvik Kundu; Sung-Yub Kim; Eunho Yang

arXiv:2410.03355·cs.CV·March 4, 2025

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding

Doohyuk Jang, Sihwan Park, June Yong Yang, Yeonsung Jung, Jihun Yun,, Souvik Kundu, Sung-Yub Kim, Eunho Yang

PDF

Open Access 1 Repo 3 Models 1 Video

TL;DR

LANTERN introduces a relaxed decoding method that significantly accelerates visual autoregressive models by addressing token selection ambiguity, enabling more flexible token use without sacrificing image quality.

Contribution

The paper proposes LANTERN, a novel relaxed decoding approach that improves speculative decoding efficiency in visual AR models by leveraging token interchangeability in latent space.

Findings

01

LANTERN achieves 1.75x speed-up over naive speculative decoding.

02

LANTERN achieves 1.82x speed-up over greedy decoding.

03

The method maintains image quality and semantic coherence.

Abstract

Auto-Regressive (AR) models have recently gained prominence in image generation, often matching or even surpassing the performance of diffusion models. However, one major limitation of AR models is their sequential nature, which processes tokens one at a time, slowing down generation compared to models like GANs or diffusion-based methods that operate more efficiently. While speculative decoding has proven effective for accelerating LLMs by generating multiple tokens in a single forward, its application in visual AR models remains largely unexplored. In this work, we identify a challenge in this setting, which we term \textit{token selection ambiguity}, wherein visual AR models frequently assign uniformly low probabilities to tokens, hampering the performance of speculative decoding. To overcome this challenge, we propose a relaxed acceptance condition referred to as LANTERN that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jadohu/LANTERN
pytorch

Models

Videos

LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion