DMAP: A Distribution Map for Text

Tom Kempton; Julia Rozanova; Parameswaran Kamalaruban; Maeve Madigan; Karolina Wresilo; Yoann L. Launay; David Sutton; Stuart Burrell

arXiv:2602.11871·cs.CL·May 15, 2026

DMAP: A Distribution Map for Text

Tom Kempton, Julia Rozanova, Parameswaran Kamalaruban, Maeve Madigan, Karolina Wresilo, Yoann L. Launay, David Sutton, Stuart Burrell

PDF

1 Video

TL;DR

DMAP is a novel, mathematically grounded method that maps text via language models into a set of samples encoding rank and probability, enabling versatile, efficient analysis of generated and real text.

Contribution

Introduces DMAP, a model-agnostic, mathematically grounded approach for analyzing text with language models, supporting diverse applications like data integrity and forensic analysis.

Findings

01

DMAP provides a unified statistical view of text.

02

It is simple to compute on consumer hardware.

03

Demonstrates utility in validation, detection, and forensic analysis.

Abstract

Large Language Models (LLMs) are a powerful tool for statistical text analysis, with derived sequences of next-token probability distributions offering a wealth of information. Extracting this signal typically relies on metrics such as perplexity, which do not adequately account for context; how one should interpret a given next-token probability is dependent on the number of reasonable choices encoded by the shape of the conditional distribution. In this work, we present DMAP, a mathematically grounded method that maps a text, via a language model, to a set of samples in the unit interval that jointly encode rank and probability information. This representation enables efficient, model-agnostic analysis and supports a range of applications. We illustrate its utility through three case studies: (i) validation of generation parameters to ensure data integrity, (ii) examining the role of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DMAP: A Distribution Map for Text· slideslive