Cross-Context Verification: Hierarchical Detection of Benchmark Contamination through Session-Isolated Analysis

Tae-Eun Song

arXiv:2603.21454·cs.CL·April 2, 2026

Cross-Context Verification: Hierarchical Detection of Benchmark Contamination through Session-Isolated Analysis

Tae-Eun Song

PDF

1 Repo

TL;DR

This paper introduces a hierarchical, multi-agent framework called Cross-Context Verification (CCV) for detecting benchmark contamination in large language models, achieving perfect separation between contaminated and genuine reasoning.

Contribution

It presents a novel black-box, multi-session detection method and a hierarchical analysis architecture that significantly improves contamination detection accuracy over existing approaches.

Findings

01

Contamination is binary—models either recall perfectly or not at all.

02

Reasoning absence is a perfect discriminator.

03

33% of prior contamination labels are false positives.

Abstract

LLM coding benchmarks face a credibility crisis: widespread solution leakage and test quality issues undermine SWE-bench Verified, while existing detection methods--paraphrase consistency, n-gram overlap, perplexity analysis--never directly observe whether a model reasons or recalls. Meanwhile, simply repeating verification degrades accuracy: multi-turn review generates false positives faster than it discovers true errors, suggesting that structural approaches are needed. We introduce Cross-Context Verification (CCV), a black-box method that solves the same benchmark problem in N independent sessions and measures solution diversity, combined with the Hierarchical Cross-Context Architecture (HCCA), a multi-agent analysis framework that prevents confirmation bias through intentional information restriction across specialized analytical roles. On 9 SWE-bench Verified problems (45…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.