AstroMLab 2: AstroLLaMA-2-70B Model and Benchmarking Specialised LLMs for Astronomy
Rui Pan, Tuan Dung Nguyen, Hardik Arora, Alberto Accomazzi, Tirthankar, Ghosal, Yuan-Sen Ting

TL;DR
This paper evaluates and improves astronomy-specific large language models through continual pretraining and benchmarking, introducing new models and highlighting the importance of high-quality domain data for performance enhancement.
Contribution
It introduces AstroLLaMA-3-8B and AstroLLaMA-2-70B models, provides a new astronomy benchmark, and demonstrates the benefits of domain-specific continual pretraining.
Findings
Continual pretraining on high-quality data improves model performance.
Smaller models suffer from catastrophic forgetting.
70B models benefit significantly from domain-specific pretraining.
Abstract
Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study aims to quantitatively assess specialized LLMs in astronomy. We find that the previously released AstroLLaMA series, based on LLaMA-2-7B, underperforms compared to the base model. We demonstrate that this performance degradation can be partially mitigated by utilizing high-quality data for continual pretraining, such as summarized text from arXiv. Despite the observed catastrophic forgetting in smaller models, our results indicate that continual pretraining on the 70B model can yield significant improvements. However, the current supervised…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗AstroMLab/astrollama-2-70b-base_aicmodel· 13 dl· ♡ 213 dl♡ 2
- 🤗AstroMLab/astrollama-2-70b-chat_aicmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗AstroMLab/astrollama-3-8b-base_aicmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗AstroMLab/astrollama-3-8b-chat_aicmodel· 6 dl6 dl
- 🤗AstroMLab/astrollama-3-8b-base_summarymodel· 10 dl10 dl
- 🤗AstroMLab/astrollama-3-8b-chat_summarymodel· 6 dl· ♡ 16 dl♡ 1
- 🤗RichardErkhov/AstroMLab_-_astrollama-3-8b-base_summary-4bitsmodel· 1 dl1 dl
- 🤗RichardErkhov/AstroMLab_-_astrollama-3-8b-base_summary-8bitsmodel· 2 dl2 dl
- 🤗RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-4bitsmodel
- 🤗RichardErkhov/AstroMLab_-_astrollama-3-8b-base_aic-8bitsmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomical Observations and Instrumentation · Geophysics and Gravity Measurements · Astronomy and Astrophysical Research
MethodsSparse Evolutionary Training · Balanced Selection
