Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

Minghui Jia; Qichao Zhang; Ali Luo; Linjing Li; Shuo Ye; Hailing Lu; Wen Hou; Dongbin Zhao

arXiv:2601.06498·cs.CL·April 24, 2026

Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection

Minghui Jia, Qichao Zhang, Ali Luo, Linjing Li, Shuo Ye, Hailing Lu, Wen Hou, Dongbin Zhao

PDF

1 Repo 2 Models 2 Datasets

TL;DR

Spec-o3 is a multimodal agent that automates spectral inspection for rare celestial objects, significantly improving accuracy and generalization over existing methods, and supporting transparent decision-making.

Contribution

It introduces Spec-o3, a tool-augmented vision-language model trained with a novel two-stage process, achieving state-of-the-art results in rare celestial object identification.

Findings

01

Boosted macro-F1 score from 28.3 to 76.5 on LAMOST data.

02

Outperformed proprietary VLMs and deep models in accuracy.

03

Demonstrated strong generalization across different survey datasets.

Abstract

Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manually intensive process. In this process, astronomers leverage specialized tools to analyze spectra and construct reliable catalogs. However, this practice has become the primary bottleneck, as it is fundamentally incapable of scaling with the data deluge from modern spectroscopic surveys. To bridge this gap, we propose Spec-o3, a tool-augmented vision-language agent that performs astronomer-aligned spectral inspection via interleaved multimodal chain-of-thought reasoning. Spec-o3 is trained with a two-stage post-training recipe: cold-start supervised fine-tuning on expert inspection trajectories followed by outcome-based reinforcement learning on rare-type verification tasks. Evaluated on five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Maxwell-Jia/spec-o3
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.