Can LLMs Do Rocket Science? Exploring the Limits of Complex Reasoning with GTOC 12
I\~naki del Campo, Pablo Cuervo, Victor Rodriguez-Fernandez, Roberto Armellin, Jack Yarndley

TL;DR
This paper evaluates the reasoning and planning capabilities of large language models in complex astrodynamics tasks, revealing significant progress but also critical limitations in physical implementation and autonomous execution.
Contribution
It introduces a novel framework for assessing LLMs in high-dimensional space missions and highlights the gap between strategic understanding and practical implementation.
Findings
Average strategic viability scores nearly doubled over two years
Advanced models understand mission concepts but struggle with physical implementation
Physical and debugging errors limit autonomous execution capabilities
Abstract
Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation and general reasoning, yet their capacity for autonomous multi-stage planning in high-dimensional, physically constrained environments remains an open research question. This study investigates the limits of current AI agents by evaluating them against the 12th Global Trajectory Optimization Competition (GTOC 12), a complex astrodynamics challenge requiring the design of a large-scale asteroid mining campaign. We adapt the MLE-Bench framework to the domain of orbital mechanics and deploy an AIDE-based agent architecture to autonomously generate and refine mission solutions. To assess performance beyond binary validity, we employ an "LLM-as-a-Judge" methodology, utilizing a rubric developed by domain experts to evaluate strategic viability across five structural categories. A comparative analysis of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpace Satellite Systems and Control · Spacecraft Dynamics and Control · AI-based Problem Solving and Planning
