Encoding and Understanding Astrophysical Information in Large Language Model-Generated Summaries
Kiera McCormick, Rafael Mart\'inez-Galarza

TL;DR
This paper explores how large language models can encode astrophysical measurement information into summaries, examining the influence of prompting and language features on the encoding process using autoencoders.
Contribution
It introduces a method to analyze how LLMs encode physical measurement data in astrophysics, focusing on prompt effects and linguistic features with interpretable autoencoders.
Findings
Prompting significantly affects the encoding of physical quantities.
Certain language features are more important for encoding physics.
Autoencoders can extract interpretable astrophysical features from summaries.
Abstract
Large Language Models have demonstrated the ability to generalize well at many levels across domains, modalities, and even shown in-context learning capabilities. This enables research questions regarding how they can be used to encode physical information that is usually only available from scientific measurements, and loosely encoded in textual descriptions. Using astrophysics as a test bed, we investigate if LLM embeddings can codify physical summary statistics that are obtained from scientific measurements through two main questions: 1) Does prompting play a role on how those quantities are codified by the LLM? and 2) What aspects of language are most important in encoding the physics represented by the measurement? We investigate this using sparse autoencoders that extract interpretable features from the text.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Gaussian Processes and Bayesian Inference · Computational and Text Analysis Methods
