Can LLMs Produce Original Astronomy Research in a Semester? A Graduate Class Experiment
Ann Zabludoff, Chen-Yu Chuang, Parker Thomas Johnson, Yichen Liu, Brina Bianca Martinez, Neev Shah, Lucille Steffes, Gabriel Glen Weible

TL;DR
This study explores whether large language models can assist graduate students in conducting original astronomy research within a semester, highlighting both potential benefits and current limitations.
Contribution
It provides empirical insights into LLMs' effectiveness and challenges in supporting complex scientific research tasks in astronomy education.
Findings
LLMs helped students complete research drafts within a semester.
Models often provided false citations and struggled with complex code and data retrieval.
Students' perceptions of LLM usefulness varied, with concerns about creativity and accuracy.
Abstract
We discuss the results of using large language models (LLMs) to conduct original scientific research in an unfamiliar subject area during the Fall 2025 semester. Students in a graduate astronomy and astrophysics course were asked to test whether LLMs could help them complete research tasks faster and at a level of detail and accuracy required for scientific publication. Most students employed LLMs for a total of 5-10 hours. While all students completed a draft paper on an unsolved problem related to galaxies by semester's end, their impressions of the models' value varied. About half thought that the models saved them time. Many noted that LLMs failed to provide appropriately detailed insights or steps to addressing open, niche questions over a several-month timeframe. The LLMs also frequently (about 20% of the time) returned false citations, links, or summaries of papers. The models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
