Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

Laurits Fredsgaard; Aaron Thomas; Michael Riis Andersen; Mikkel N. Schmidt; Mahito Sugiyama

arXiv:2604.05613·cs.LG·April 8, 2026

Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

Laurits Fredsgaard, Aaron Thomas, Michael Riis Andersen, Mikkel N. Schmidt, Mahito Sugiyama

PDF

1 Repo

TL;DR

This paper investigates the calibration issues of autoregressive graph generators, introducing a measure called Linearization Uncertainty (LU) to evaluate the consistency of likelihoods across different graph linearizations, and demonstrates its effectiveness in molecular graph generation.

Contribution

The authors propose Linearization Uncertainty as a new metric for assessing likelihood calibration in graph generators and show its advantages over traditional NLL-based evaluation.

Findings

01

Biased orderings have lower NLL on native linearizations but higher calibration error under permutations.

02

LU correlates strongly with molecular stability, outperforming NLL in quality assessment.

03

Permutation-based evaluation provides a more reliable measure of generated graph quality.

Abstract

Autoregressive graph generators define likelihoods via a sequential construction process, but these likelihoods are only meaningful if they are consistent across all linearizations of the same graph. Segmented Eulerian Neighborhood Trails (SENT), a recent linearization method, converts graphs into sequences that can be perfectly decoded and efficiently processed by language models, but admit multiple equivalent linearizations of the same graph. We quantify violations in assigned negative log-likelihood (NLL) using the coefficient of variation across equivalent linearizations, which we call Linearization Uncertainty (LU). Training transformers under four linearization strategies on two datasets, we show that biased orderings achieve lower NLL on their native order but exhibit expected calibration error (ECE) two orders of magnitude higher under random permutation, indicating that these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lauritsf/linearization-uncertainty
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.