Visual Language Models as Operator Agents in the Space Domain
Alejandro Carrasco, Marco Nedungadi, Enrico M. Zucchelli, Amit Jain,, Victor Rodriguez-Fernandez, Richard Linares

TL;DR
This paper investigates how Vision-Language Models can serve as operator agents in space, enhancing autonomous decision-making for both software simulations and physical robotic systems, demonstrating promising results in complex tasks.
Contribution
It introduces the novel application of VLMs as operator agents in space, integrating them into simulation and robotic systems for improved autonomous control.
Findings
VLMs effectively interpret visual data for complex orbital maneuvers.
VLMs assist in inspecting and diagnosing space objects with high accuracy.
VLMs outperform traditional methods in simulation tasks.
Abstract
This paper explores the application of Vision-Language Models (VLMs) as operator agents in the space domain, focusing on both software and hardware operational paradigms. Building on advances in Large Language Models (LLMs) and their multimodal extensions, we investigate how VLMs can enhance autonomous control and decision-making in space missions. In the software context, we employ VLMs within the Kerbal Space Program Differential Games (KSPDG) simulation environment, enabling the agent to interpret visual screenshots of the graphical user interface to perform complex orbital maneuvers. In the hardware context, we integrate VLMs with robotic systems equipped with cameras to inspect and diagnose physical space objects, such as satellites. Our results demonstrate that VLMs can effectively process visual and textual data to generate contextually appropriate actions, competing with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
