On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

Aarav Gupta; Gururaj Deshpande; Chandreyi Chakraborty

arXiv:2604.20079·cs.LG·April 23, 2026

On the Quantization Robustness of Diffusion Language Models in Coding Benchmarks

Aarav Gupta, Gururaj Deshpande, Chandreyi Chakraborty

PDF

TL;DR

This paper explores the robustness of diffusion-based language models under post-training quantization, demonstrating their superior resilience and efficiency advantages over auto-regressive models in coding benchmarks.

Contribution

It is the first to evaluate and compare the quantization robustness of diffusion LLMs with auto-regressive models, highlighting their potential for efficient deployment.

Findings

01

Diffusion LLMs show greater robustness at low bitwidths (2-4 bits) compared to auto-regressive models.

02

Quantization methods like GPTQ and HAWQ perform better on diffusion models, with less accuracy loss.

03

Mixed-precision configurations enable smooth trade-offs between accuracy, latency, and memory.

Abstract

Auto-regressive Large Language Models (LLMs) achieve strong performance on coding tasks, but incur high memory and inference costs. Diffusion-based language models (d-LLMs) offer bounded inference cost via iterative denoising, but their behavior under post-training quantization (PTQ) has been sparsely explored. We investigate the application and robustness of PTQ techniques, specifically GPTQ and a modified Hessian-Aware Quantization (HAWQ) algorithm, on a diffusion-based coding LLM (CoDA) and observe that these methods applied to CoDA exhibit greater robustness at low bitwidths compared to Qwen3-1.7B, its auto-regressive counterpart, under a standardized evaluation pipeline. We find that in our setup, CoDA exhibits greater robustness at low bitwidths (2-4 bits), with smaller accuracy degradation across HumanEval and MBPP benchmarks. Additionally, mixed-precision configurations derived…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.