Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications

Yoon Pyo Lee

arXiv:2507.09931·cs.LG·September 16, 2025

Mechanistic Interpretability of LoRA-Adapted Language Models for Nuclear Reactor Safety Applications

Yoon Pyo Lee

PDF

Open Access

TL;DR

This paper introduces a novel interpretability methodology for LLMs in nuclear safety, identifying key neurons involved in domain-specific reasoning and demonstrating their causal role in model performance.

Contribution

It presents a new approach to understanding and verifying neural circuits in LLMs adapted for nuclear safety applications, enhancing transparency and trust.

Findings

01

Identified neurons significantly altered during domain adaptation.

02

Silencing all specialized neurons degraded model performance.

03

Impairment in generating technical, contextually accurate information.

Abstract

The integration of Large Language Models (LLMs) into safety-critical domains, such as nuclear engineering, necessitates a deep understanding of their internal reasoning processes. This paper presents a novel methodology for interpreting how an LLM encodes and utilizes domain-specific knowledge, using a Boiling Water Reactor system as a case study. We adapted a general-purpose LLM (Gemma-3-1b-it) to the nuclear domain using a parameter-efficient fine-tuning technique known as Low-Rank Adaptation. By comparing the neuron activation patterns of the base model to those of the fine-tuned model, we identified a sparse set of neurons whose behavior was significantly altered during the adaptation process. To probe the causal role of these specialized neurons, we employed a neuron silencing technique. Our results demonstrate that while silencing most of these specialized neurons individually did…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Safety Analysis · Topic Modeling