Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction

Hsing Wen Lin; Zong-Fu Sie

arXiv:2605.11154·astro-ph.IM·May 13, 2026

Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction

Hsing Wen Lin, Zong-Fu Sie

PDF

TL;DR

This study introduces an information-theoretic framework using Large Language Models to assess how well astrophysical methods can be reconstructed from textual descriptions, highlighting limitations in capturing implementation details.

Contribution

It presents a novel approach combining information theory and LLMs to quantify the reconstructability of scientific methods from text, demonstrated through a spectral reconstruction case study.

Findings

01

Text clarifies overall algorithmic structure but not implementation details.

02

Persistent variance at the implementation level creates an 'entropy floor'.

03

LLMs recover core methodologies but miss tacit expert knowledge.

Abstract

Modern astrophysical studies rely heavily on complex data analysis pipelines; however, published descriptions often lack the detail required for computational reproducibility. In this work, we present an information-theoretic framework to quantify how effectively a method can be reconstructed from its written description. By treating algorithmic reconstruction as a probability distribution generated by Large Language Models (LLMs), we utilize Shannon entropy and Jensen-Shannon divergence to measure how strongly text constrains the hypothesis space of valid implementations. We demonstrate this approach through a case study of Trans-Neptunian Object (TNO) spectral reconstruction from sparse photometry. By prompting frontier LLMs with varying levels of manuscript text (Title, Abstract, and Methods), we find that while increasing text successfully clarifies the overall algorithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.