Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models

Arash Gholami Davoodi; Navid Rezazadeh; Seyed Pouyan Mousavi Davoudi; Pouya Pezeshkpour

arXiv:2602.10346·cs.CL·May 15, 2026

Geometry-Aware Decoding with Wasserstein-Regularized Truncation and Mass Penalties for Large Language Models

Arash Gholami Davoodi, Navid Rezazadeh, Seyed Pouyan Mousavi Davoudi, Pouya Pezeshkpour

PDF

TL;DR

This paper introduces Top-W, a geometry-aware truncation method for large language models that uses Wasserstein distance to improve decoding quality by balancing diversity and coherence.

Contribution

The paper proposes a novel Wasserstein-regularized truncation rule that explicitly considers token embedding geometry, outperforming heuristic methods in language model decoding.

Findings

01

Top-W achieves up to 33.7% improvement over prior methods.

02

It enhances both accuracy and creativity in language model outputs.

03

The method is efficient and compatible with standard decoding routines.

Abstract

Large language models (LLMs) must balance diversity and creativity against logical coherence in open-ended generation. Existing truncation-based samplers are effective but largely heuristic, relying mainly on probability mass and entropy while ignoring semantic geometry of the token space. We present Top-W, a geometry-aware truncation rule that uses Wasserstein distance-defined over token-embedding geometry-to keep the cropped distribution close to the original, while explicitly balancing retained probability mass against the entropy of the kept set. Our theory yields a simple closed-form structure for the fixed-potential subset update: depending on the mass-entropy trade-off, the optimal crop either collapses to a single token or takes the form of a one-dimensional prefix that can be found efficiently with a linear scan. We implement Top-W using efficient geometry-based potentials…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.