DiffScore: Text Evaluation Beyond Autoregressive Likelihood

Wen Lai; Yingli Shen; Dingnan Jin; Qing Cui; Jun Zhou; Maosong Sun; Alexander Fraser

arXiv:2605.11601·cs.CL·May 13, 2026

DiffScore: Text Evaluation Beyond Autoregressive Likelihood

Wen Lai, Yingli Shen, Dingnan Jin, Qing Cui, Jun Zhou, Maosong Sun, Alexander Fraser

PDF

1 Repo

TL;DR

DiffScore introduces a bidirectional text evaluation method using masked reconstruction, overcoming autoregressive bias and providing detailed diagnostic tools, outperforming existing benchmarks.

Contribution

It presents DiffScore, a novel evaluation framework based on masked diffusion models that eliminates positional bias and offers advanced diagnostic capabilities.

Findings

01

DiffScore outperforms autoregressive baselines on ten benchmarks.

02

It provides multi-timestep quality profiles for detailed analysis.

03

DiffScore effectively disentangles fluency from faithfulness.

Abstract

Autoregressive language models are widely used for text evaluation, however, their left-to-right factorization introduces positional bias, i.e., early tokens are scored with only leftward context, conflating architectural asymmetry with true text quality. We propose masked reconstruction as an alternative paradigm, where every token is scored using full bidirectional context. We introduce DiffScore, an evaluation framework built on Masked Large Diffusion Language Models. By measuring text recoverability across continuous masking rates, DiffScore eliminates positional bias and naturally establishes an evaluation hierarchy from local fluency to global coherence. We further provide diagnostic tools unavailable to autoregressive frameworks: multi-timestep quality profiles that decompose scores across masking rates, and bidirectional PMI decomposition that disentangles fluency from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenlai-lavine/DiffScore
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.