Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Chien Van Nguyen; Chaitra Hegde; Van Cuong Pham; Ryan A. Rossi; Franck Dernoncourt; Thien Huu Nguyen

arXiv:2605.12825·cs.LG·May 19, 2026

Orthrus: Memory-Efficient Parallel Token Generation via Dual-View Diffusion

Chien Van Nguyen, Chaitra Hegde, Van Cuong Pham, Ryan A. Rossi, Franck Dernoncourt, Thien Huu Nguyen

PDF

1 Repo 3 Models

TL;DR

Orthrus is a novel dual-architecture framework that combines the accuracy of autoregressive LLMs with the speed of diffusion models, enabling lossless, parallel token generation with significant speedup.

Contribution

It introduces a unified system integrating autoregressive and diffusion views in Transformers, achieving high-fidelity, parallel token generation with minimal overhead.

Findings

01

Up to 7.8x speedup in token generation

02

Exact consensus guarantees lossless inference

03

Minimal memory and parameter overhead

Abstract

We introduce Orthrus, a simple and efficient dual-architecture framework that unifies the exact generation fidelity of autoregressive Large Language Models (LLMs) with the high-speed parallel token generation of diffusion models. The sequential nature of standard autoregressive decoding represents a fundamental bottleneck for high-throughput inference. While diffusion language models attempt to break this barrier via parallel generation, they suffer from significant performance degradation, high training costs, and a lack of rigorous convergence guarantees. Orthrus resolves this dichotomy natively. Designed to seamlessly integrate into existing Transformers, the framework augments a frozen LLM with a lightweight, trainable module to create a parallel diffusion view alongside the standard autoregressive view. In this unified system, both views attend to the exact same high-fidelity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chiennv2000/orthrus
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.