TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Felipe Nuti; Tim Franzmeyer; Jo\~ao Henriques

arXiv:2506.23423·cs.CL·July 1, 2025

TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs

Felipe Nuti, Tim Franzmeyer, Jo\~ao Henriques

PDF

Open Access

TL;DR

This paper introduces TuCo, a method to quantify how fine-tuning influences individual responses of large language models by decomposing responses into pre-training and fine-tuning components, enabling detailed analysis of model behavior and safety.

Contribution

The paper presents a novel method for measuring the contribution of fine-tuning to individual LLM outputs using hidden state analysis and theoretical decomposition, advancing understanding of fine-tuning effects.

Findings

01

TuCo can steer model behavior by scaling fine-tuning components.

02

Attenuating fine-tuning effects reduces vulnerability to adversarial attacks.

03

TuCo correlates with safety and attack success in LLMs.

Abstract

Past work has studied the effects of fine-tuning on large language models' (LLMs) overall performance on certain tasks. However, a quantitative and systematic method for analyzing its effect on individual outputs is still lacking. Here, we propose a new method for measuring the contribution that fine-tuning makes to individual LLM responses, assuming access to the original pre-trained model. Our method tracks the model's intermediate hidden states, providing a more fine-grained insight into the effects of fine-tuning than a simple comparison of final outputs from pre-trained and fine-tuned models. We introduce and theoretically analyze an exact decomposition of any fine-tuned LLM into a pre-training component and a fine-tuning component. Empirically, we find that model behavior and performance can be steered by up- or down-scaling the fine-tuning component during the forward pass.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)