ReLE: A Scalable System and Structured Benchmark for Diagnosing Capability Anisotropy in Chinese LLMs
Rui Fang, Jian Li, Wei Chen, Bin Hu, Ying-Cong Chen, Xin Tang, Liang Diao

TL;DR
ReLE is a scalable, structured evaluation system for diagnosing capability anisotropy in Chinese LLMs, reducing costs and revealing model specialization and trade-offs across domains and capabilities.
Contribution
The paper introduces ReLE, a novel evaluation system with a hybrid scoring mechanism and variance-aware scheduler, enabling efficient, detailed analysis of Chinese LLMs' capabilities.
Findings
ReLE reduces evaluation costs by 70% while maintaining high ranking correlation.
Models show high specialization with a Rank Stability Amplitude of 11.4.
Evaluation reveals significant sensitivity of rankings to weighting schemes.
Abstract
Large Language Models (LLMs) have achieved rapid progress in Chinese language understanding, yet accurately evaluating their capabilities remains challenged by benchmark saturation and prohibitive computational costs. While static leaderboards provide snapshot rankings, they often mask the structural trade-offs between capabilities. In this work, we present ReLE (Robust Efficient Live Evaluation), a scalable system designed to diagnose Capability Anisotropy, the non-uniformity of model performance across domains. Using ReLE, we evaluate 304 models (189 commercial, 115 open-source) across a Domain Capability orthogonal matrix comprising 207,843 samples. We introduce two methodological contributions to address current evaluation pitfalls: (1) A Symbolic-Grounded Hybrid Scoring Mechanism that eliminates embedding-based false positives in reasoning tasks; (2) A Dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
