I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

Giacomo Camposampiero; Michael Hersche; Roger Wattenhofer; Abu Sebastian; Abbas Rahimi

arXiv:2510.17496·cs.LG·November 3, 2025

I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

Giacomo Camposampiero, Michael Hersche, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi

PDF

Open Access

TL;DR

I-RAVEN-X is a new benchmark that assesses the generalization and robustness of large language and reasoning models in analogical and mathematical reasoning, highlighting their strengths and current limitations.

Contribution

It extends the I-RAVEN benchmark to include more complex reasoning scenarios and evaluates the performance of LRMs and LLMs under these conditions.

Findings

01

LRMs outperform LLMs in productivity and systematicity.

02

LRMs struggle with reasoning under uncertainty.

03

Models have difficulty exploring multiple probabilistic outcomes.

Abstract

We introduce I-RAVEN-X, a symbolic benchmark designed to evaluate generalization and robustness in analogical and mathematical reasoning for Large Language Models (LLMs) and Large Reasoning Models (LRMs). I-RAVEN-X extends I-RAVEN by increasing operand complexity, attribute range, and introducing perceptual uncertainty. Compared to LLMs, empirical results show that LRMs achieve improved productivity and systematicity on longer reasoning relations and wider attribute ranges, respectively. However, LRMs are still significantly challenged by reasoning under uncertainty and cannot effectively explore multiple probabilistic outcomes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Computational and Text Analysis Methods