Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis

Kejian Zhu; Shangqing Tu; Zhuoran Jin; Lei Hou; Juanzi Li; Jun Zhao

arXiv:2506.04142·cs.CL·June 5, 2025

Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis

Kejian Zhu, Shangqing Tu, Zhuoran Jin, Lei Hou, Juanzi Li, Jun Zhao

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel method to improve the trustworthiness of large language model evaluations by identifying and suppressing shortcut neurons, thereby reducing contamination effects and correlating highly with established benchmarks.

Contribution

It proposes a new approach using shortcut neuron analysis and patching to mitigate contamination in LLM evaluation, enhancing reliability without building new benchmarks.

Findings

01

High correlation ($ ho$ > 0.95) with MixEval benchmark

02

Effective suppression of shortcut neurons improves evaluation accuracy

03

Method generalizes across different benchmarks and settings

Abstract

The development of large language models (LLMs) depends on trustworthy evaluation. However, most current evaluations rely on public benchmarks, which are prone to data contamination issues that significantly compromise fairness. Previous researches have focused on constructing dynamic benchmarks to address contamination. However, continuously building new benchmarks is costly and cyclical. In this work, we aim to tackle contamination by analyzing the mechanisms of contaminated models themselves. Through our experiments, we discover that the overestimation of contaminated models is likely due to parameters acquiring shortcut solutions in training. We further propose a novel method for identifying shortcut neurons through comparative and causal analysis. Building on this, we introduce an evaluation method called shortcut neuron patching to suppress shortcut neurons. Experiments validate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)