MARS: Enabling Autoregressive Models Multi-Token Generation

Ziqi Jin; Lei Wang; Ziwei Luo; Aixin Sun

arXiv:2604.07023·cs.CL·April 9, 2026

MARS: Enabling Autoregressive Models Multi-Token Generation

Ziqi Jin, Lei Wang, Ziwei Luo, Aixin Sun

PDF

1 Repo

TL;DR

MARS is a lightweight fine-tuning method that enables autoregressive models to predict multiple tokens per step, improving throughput and maintaining accuracy without architectural changes.

Contribution

MARS introduces a simple fine-tuning approach allowing AR models to generate multiple tokens per step with no performance loss and enhanced inference speed.

Findings

01

MARS matches or exceeds baseline accuracy on six benchmarks.

02

Achieves 1.5-1.7x throughput with multi-token generation.

03

Develops a block-level KV caching strategy for faster batch inference.

Abstract

Autoregressive (AR) language models generate text one token at a time, even when consecutive tokens are highly predictable given earlier context. We introduce MARS (Mask AutoRegreSsion), a lightweight fine-tuning method that teaches an instruction-tuned AR model to predict multiple tokens per forward pass. MARS adds no architectural modifications, no extra parameters, and produces a single model that can still be called exactly like the original AR model with no performance degradation. Unlike speculative decoding, which maintains a separate draft model alongside the target, or multi-head approaches such as Medusa, which attach additional prediction heads, MARS requires only continued training on existing instruction data. When generating one token per forward pass, MARS matches or exceeds the AR baseline on six standard benchmarks. When allowed to accept multiple tokens per step, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xalp/MARS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.