AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs   for Astronomy

Rui Pan; Tuan Dung Nguyen; Hardik Arora; Alberto Accomazzi; Tirthankar; Ghosal; Yuan-Sen Ting

arXiv:2409.19750·astro-ph.IM·October 1, 2024·2 cites

AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy

Rui Pan, Tuan Dung Nguyen, Hardik Arora, Alberto Accomazzi, Tirthankar, Ghosal, Yuan-Sen Ting

PDF

Open Access 10 Models

TL;DR

This paper evaluates and improves astronomy-specific large language models through continual pretraining and benchmarking, introducing new models and highlighting the importance of high-quality domain data for performance enhancement.

Contribution

It introduces AstroLLaMA-3-8B and AstroLLaMA-2-70B models, provides a new astronomy benchmark, and demonstrates the benefits of domain-specific continual pretraining.

Findings

01

Continual pretraining on high-quality data improves model performance.

02

Smaller models suffer from catastrophic forgetting.

03

70B models benefit significantly from domain-specific pretraining.

Abstract

Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAstronomical Observations and Instrumentation · Geophysics and Gravity Measurements · Astronomy and Astrophysical Research

MethodsSparse Evolutionary Training · Balanced Selection