Cross-Model Disagreement as a Label-Free Correctness Signal

Matt Gorbett; Suman Jana

arXiv:2603.25450·cs.AI·March 27, 2026

Cross-Model Disagreement as a Label-Free Correctness Signal

Matt Gorbett, Suman Jana

PDF

Open Access

TL;DR

This paper introduces cross-model disagreement as a training-free, label-free method to detect when a language model is wrong, especially in confident error cases, improving deployment safety.

Contribution

The authors propose cross-model disagreement metrics, CMP and CME, that outperform traditional uncertainty measures in identifying model errors without additional training.

Findings

01

CMP achieves 0.75 AUROC on MMLU, outperforming baselines.

02

CME and CMP outperform within-model uncertainty on multiple benchmarks.

03

Method is applicable for deployment monitoring and model oversight.

Abstract

Detecting when a language model is wrong without ground truth labels is a fundamental challenge for safe deployment. Existing approaches rely on a model's own uncertainty -- such as token entropy or confidence scores -- but these signals fail critically on the most dangerous failure mode: confident errors, where a model is wrong but certain. In this work we introduce cross-model disagreement as a correctness indicator -- a simple, training-free signal that can be dropped into existing production systems, routing pipelines, and deployment monitoring infrastructure without modification. Given a model's generated answer, cross-model disagreement computes how surprised or uncertain a second verifier model is when reading that answer via a single forward pass. No generation from the verifying model is required, and no correctness labels are needed. We instantiate this principle as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware System Performance and Reliability · Adversarial Robustness in Machine Learning · Topic Modeling