On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

Yannic Neuhaus; Nicolas Flammarion; Matthias Hein; Francesco Croce

arXiv:2602.15460·cs.LG·February 18, 2026

On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

Yannic Neuhaus, Nicolas Flammarion, Matthias Hein, Francesco Croce

PDF

Open Access

TL;DR

This paper evaluates how well chain-of-thought reasoning generalizes in multimodal large language models on a simple visual planning task, revealing limitations in out-of-distribution scenarios and highlighting the benefits of mixed text formats.

Contribution

It introduces a framework for assessing OOD reasoning generalization in multimodal LLMs on a grid navigation task, comparing various input and reasoning strategies.

Findings

01

CoT improves in-distribution generalization across representations.

02

OOD generalization remains limited, especially for larger maps.

03

Mixed text formats enhance OOD reasoning performance.

Abstract

Integrating reasoning in large language models and large vision-language models has recently led to significant improvement of their capabilities. However, the generalization of reasoning models is still vaguely defined and poorly understood. In this work, we present an evaluation framework to rigorously examine how well chain-of-thought (CoT) approaches generalize on a simple planning task. Specifically, we consider a grid-based navigation task in which a model is provided with a map and must output a sequence of moves that guides a player from a start position to a goal while avoiding obstacles. The versatility of the task and its data allows us to fine-tune model variants using different input representations (visual and textual) and CoT reasoning strategies, and systematically evaluate them under both in-distribution (ID) and out-of-distribution (OOD) test conditions. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning