MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in   LLMs

Saeid Asgari Taghanaki; Aliasgahr Khani; Amir Khasahmadi

arXiv:2409.02257·cs.CL·October 17, 2024

MMLU-Pro+: Evaluating Higher-Order Reasoning and Shortcut Learning in LLMs

Saeid Asgari Taghanaki, Aliasgahr Khani, Amir Khasahmadi

PDF

Open Access 1 Repo

TL;DR

MMLU-Pro+ is a new benchmark that challenges large language models with complex, multi-answer questions to better evaluate their reasoning skills and resistance to shortcut learning, revealing significant performance gaps.

Contribution

The paper introduces MMLU-Pro+, an enhanced benchmark with novel metrics for assessing higher-order reasoning and shortcut learning in LLMs, improving upon previous evaluation methods.

Findings

01

MMLU-Pro+ maintains difficulty while providing more rigorous discrimination.

02

Significant performance gaps observed among six state-of-the-art LLMs.

03

New metrics offer deeper insights into model reasoning and bias.

Abstract

Existing benchmarks for large language models (LLMs) increasingly struggle to differentiate between top-performing models, underscoring the need for more challenging evaluation frameworks. We introduce MMLU-Pro+, an enhanced benchmark building upon MMLU-Pro to assess shortcut learning and higher-order reasoning in LLMs. By incorporating questions with multiple correct answers across diverse domains, MMLU-Pro+ tests LLMs' ability to engage in complex reasoning and resist simplistic problem-solving strategies. Our results show that MMLU-Pro+ maintains MMLU-Pro's difficulty while providing a more rigorous test of model discrimination, particularly in multi-correct answer scenarios. We introduce novel metrics like shortcut selection ratio and correct pair identification ratio, offering deeper insights into model behavior and anchoring bias. Evaluations of six state-of-the-art LLMs reveal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

asgsaeid/mmlu-pro-plus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification