Are Large Language Models Truly Smarter Than Humans?

Eshwar Reddy M; Sourav Karmakar

arXiv:2603.16197·cs.AI·March 18, 2026

Are Large Language Models Truly Smarter Than Humans?

Eshwar Reddy M, Sourav Karmakar

PDF

Open Access

TL;DR

This paper conducts a rigorous contamination audit of six large language models, revealing significant data overlap with training sources and its impact on their evaluated performance across various subjects.

Contribution

It introduces three complementary experiments to detect data contamination in LLMs, providing a detailed contamination ranking and analyzing its effect on model performance.

Findings

01

13.8% overall contamination rate in questions

02

Performance gains of up to 0.054 accuracy points due to contamination

03

72.5% of models show memorization signals above chance

Abstract

Public leaderboards increasingly suggest that large language models (LLMs) surpass human experts on benchmarks spanning academic knowledge, law, and programming. Yet most benchmarks are fully public, their questions widely mirrored across the internet, creating systematic risk that models were trained on the very data used to evaluate them. This paper presents three complementary experiments forming a rigorous multi-method contamination audit of six frontier LLMs: GPT-4o, GPT-4o-mini, DeepSeek-R1, DeepSeek-V3, Llama-3.3-70B, and Qwen3-235B. Experiment 1 applies a lexical contamination detection pipeline to 513 MMLU questions across all 57 subjects, finding an overall contamination rate of 13.8% (18.1% in STEM, up to 66.7% in Philosophy) and estimated performance gains of +0.030 to +0.054 accuracy points by category. Experiment 2 applies a paraphrase and indirect-reference diagnostic to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Artificial Intelligence in Healthcare and Education · Topic Modeling