AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

Weiyi Wang; Xinchi Chen; Jingjing Gong; Xuanjing Huang; Xipeng Qiu

arXiv:2601.11354·cs.AI·January 19, 2026

AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

Weiyi Wang, Xinchi Chen, Jingjing Gong, Xuanjing Huang, Xipeng Qiu

PDF

Open Access 1 Datasets

TL;DR

AstroReason-Bench is a new comprehensive benchmark designed to evaluate the capabilities of agentic Large Language Models in complex, physics-constrained space planning problems, revealing current limitations of generalist agents.

Contribution

The paper introduces AstroReason-Bench, a unified, multi-regime benchmark for assessing agentic planning in space-related problems with physical constraints and heterogeneous objectives.

Findings

01

Current agents underperform specialized solvers

02

Highlights limitations of generalist planning in realistic scenarios

03

Provides a challenging testbed for future research

Abstract

Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks largely focus on symbolic or weakly grounded environments, leaving their performance in physics-constrained real-world domains underexplored. We introduce AstroReason-Bench, a comprehensive benchmark for evaluating agentic planning in Space Planning Problems (SPP), a family of high-stakes problems with heterogeneous objectives, strict physical constraints, and long-horizon decision-making. AstroReason-Bench integrates multiple scheduling regimes, including ground station communication and agile Earth observation, and provides a unified agent-oriented interaction protocol. Evaluating on a range of state-of-the-art open- and closed-source agentic LLM systems, we find that current agents substantially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kaupane/astro-reason
dataset· 135 dl
135 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · AI-based Problem Solving and Planning · Robotic Path Planning Algorithms