Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction

Theodore Glavas; Nikhita Vedula; Dushyanta Dhyani; Yilun Zhu; Shervin Malmasi

arXiv:2604.26209·cs.CL·April 30, 2026

Breaking the Autoregressive Chain: Hyper-Parallel Decoding for Efficient LLM-Based Attribute Value Extraction

Theodore Glavas, Nikhita Vedula, Dushyanta Dhyani, Yilun Zhu, Shervin Malmasi

PDF

TL;DR

The paper introduces Hyper-Parallel Decoding, a novel method that accelerates large language model decoding by enabling parallel generation of independent sequences, significantly reducing inference time and costs.

Contribution

It presents a new decoding algorithm that allows parallel output generation in LLMs, applicable to tasks with independent output sequences, improving efficiency without sacrificing quality.

Findings

01

Decodes up to 96 tokens in parallel per prompt.

02

Reduces inference costs and time by up to 13.8X.

03

Applicable to all LLMs and various independent output tasks.

Abstract

Some text generation tasks, such as Attribute Value Extraction (AVE), require decoding multiple independent sequences from the same document context. While standard autoregressive decoding is slow due to its sequential nature, the independence between output sequences offers an opportunity for parallelism. We present Hyper-Parallel Decoding, a novel decoding algorithm that accelerates offline decoding by leveraging both shared memory and computation across batches. HPD enables out-of-order token generation through position ID manipulation, significantly improving efficiency. Experiments on AVE show that attribute-value pairs are conditionally independent, enabling us to parallelize value generation within each prompt. By further stacking multiple documents within a single prompt, we can decode in parallel up to 96 tokens per prompt. HPD works with all LLMs, and reduces both inference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.