Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference

Hai-Ling Lu; Yu-Yang Li; Yin-Bi Li; Cun-Shi Wang; A-Li Luo; Jun-Chao Liang; Shuo Li

arXiv:2605.22162·astro-ph.IM·May 22, 2026

Spectra as Language: Large Language Models for Scalable Stellar Parameter and Abundance Inference

Hai-Ling Lu, Yu-Yang Li, Yin-Bi Li, Cun-Shi Wang, A-Li Luo, Jun-Chao Liang, Shuo Li

PDF

TL;DR

This paper introduces a large language model framework for stellar spectra analysis, enabling scalable and accurate inference of stellar parameters and chemical abundances from massive spectroscopic datasets.

Contribution

It adapts language models to stellar spectra, demonstrating systematic improvements and scalability for large-scale stellar surveys.

Findings

01

Achieves accurate estimation of stellar parameters and chemical abundances.

02

Performance improves systematically with increasing data scale.

03

Provides a scalable framework for future large-scale surveys.

Abstract

Stellar spectra encode key information on the physical properties and chemical compositions of stars. Accurate stellar parameter determination is essential for addressing major questions such as galaxy and stellar evolution. Large-scale spectroscopic surveys have accumulated unprecedented spectral data. Traditional feature extraction or model-fitting approaches struggle with high-dimensional, massive datasets, limited generalization, and computational inefficiency. Recent advances in large language models demonstrate strong generalization and feature-learning in tasks like natural language processing, DNA/RNA sequence analysis, and protein/chemical parsing. Stellar spectra are continuous sequential signals, enabling the transfer of language models to stellar spectroscopy. Here, we propose a two-stage large language model framework for stellar parameter inference, achieving accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.