Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim; Eunwon Kim; Buru Chang; Junsuk Choe

arXiv:2508.21300·cs.LG·September 1, 2025

Improving Fisher Information Estimation and Efficiency for LoRA-based LLM Unlearning

Yejin Kim, Eunwon Kim, Buru Chang, Junsuk Choe

PDF

Open Access

TL;DR

This paper introduces VILA, a new unlearning framework for LoRA-based LLMs that improves Fisher information estimation accuracy and computational efficiency, enabling effective removal of sensitive data with significantly reduced costs.

Contribution

VILA explicitly addresses limitations in FILA by improving Fisher information estimation and reducing computational costs for unlearning in LoRA-based LLMs.

Findings

01

VILA achieves up to 100x higher parameter efficiency.

02

VILA is 40x faster in training compared to FILA.

03

Sets new state-of-the-art performance on TOFU, WMDP, and MUSE benchmarks.

Abstract

LLMs have demonstrated remarkable performance across various tasks but face challenges related to unintentionally generating outputs containing sensitive information. A straightforward approach to address this issue is to retrain the model after excluding the problematic data. However, this approach incurs prohibitively high computational costs. To overcome this limitation, machine unlearning has emerged as a promising solution that can effectively remove sensitive information without the need to retrain the model from scratch. Recently, FILA has been proposed as a parameter-efficient unlearning method by integrating LoRA adapters. Specifically, it calculates the Fisher information to identify parameters associated with the forget set and assigns them to LoRA adapters for updates. Despite its innovative approach, FILA still requires access to all model parameters and does not adequately…

Tables10

Table 1. Table 1: Time Costs. This table reports the GPU hours required for Retrain, Unlearn, and FILA on the TOFU benchmark using the Llama2-7B model. The Forget N% setting indicates that N% of the full dataset is designated as the forget set. For Retrain, the model is trained from scratch using only the retain set, which consists of the remaining (100–N)% of the data.

Method	Forget 1%	Forget 5%	Forget 10%
Retrain	2.28	2.18	2.08
FILA - $ℳ (𝒟)$ extraction	0.25	1.21	9.10
FILA - Unlearning	0.02	0.06	0.12

Table 2. Table 2: Main Comparison Results. Top: Unlearning performance on TOFU with Phi-1.5B and Llama2-7B across varying forget ratios. AVG Gain (↑) denotes the average improvement in unlearning loss from each initialization method, measured across data splits. Bottom: Unlearning performance on WMDP and MUSE Books. AVG is the mean of the two forget metrics per benchmark. Lower scores indicate better forgetting performance. Retain performance is omitted, as it is constrained to remain above 95% by our evaluation protocol.

Model	Method	Forget 1%	Forget 5%	Forget 10%	AVG Gain ( $↑$ )
Phi-1.5B	Original Model	-4.05	-11.92	-15.66	-
	GD	-2.52	-11.18	-14.43	-
	GD + FILA	-2.17	-10.23	-13.84	0.63
\rowcolor gray!10 \cellcolorwhite	GD + Ours	-1.54	-9.61	-10.80	2.06
	NPO	-2.52	-7.89	-10.03	-
	NPO + FILA	-2.17	-6.09	-8.83	1.12
\rowcolor gray!10 \cellcolorwhite	NPO + Ours	-2.17	-5.17	-9.30	1.27
	IHL	-2.52	-10.23	-14.13	-
	IHL + FILA	-2.17	-5.40	-1.79	5.84
\rowcolor gray!10 \cellcolorwhite	IHL + Ours	-1.85	-1.17	-0.83	7.68
Llama2-7B	Original Model	-3.30	-15.46	-19.31	-
	GD	-3.30	-9.92	-16.61	-
	GD + FILA	-3.30	-12.53	-17.27	-1.09
\rowcolor gray!10 \cellcolorwhite	GD + Ours	-2.17	-1.40	-1.18	8.36
	NPO	-3.30	-13.59	-13.84	-
	NPO + FILA	-3.30	-11.18	-11.06	1.73
\rowcolor gray!10 \cellcolorwhite	NPO + Ours	-1.54	-4.32	-4.59	6.76
	IHL	-3.30	-12.53	-7.70	-
	IHL + FILA	-3.30	-0.95	-0.47	6.27
\rowcolor gray!10 \cellcolorwhite	IHL + Ours	-1.27	-0.20	-0.40	7.22

Table 3. Table 3: Comparison of time and storage cost across different methods. We report GPU hours required for unlearning under varying forget set sizes (1%, 5%, 10%) on the TOFU benchmark using the Llama2-7B model. Storage for ℳ ( 𝒟 ) \mathcal{M}(\mathcal{D}) denotes the additional space required to store forget information map used during unlearning.

Method	Time (GPU hours)			Storage for $ℳ (𝒟)$
	Forget 1%	Forget 5%	Forget 10%
Retrain	2.28	2.18	2.08	–
FILA	0.27	1.27	9.22	25G
Ours	0.04	0.18	0.36	0.3G

Table 4. Table 4: Ablation results isolating the impact of FI Correction and LoRA Approximation on unlearning performance. Results show Forget Quality scores (lower is better). AVG Gain denotes improvement over the FILA baseline.

Loss	Method	Forget 1%	Forget 5%	Forget 10%	AVG Gain ( $↑$ )
GD	FILA (Baseline)	-2.17	-10.23	-13.84	—
	w/ FI Correction	-1.27	-9.61	-9.54	1.94
	w/ LoRA Approximation	-2.17	-9.61	-13.54	0.30
\rowcolorgray!10 \cellcolorwhite	w/ Both (VILA)	-1.54	-9.61	-10.80	1.43
NPO	FILA (Baseline)	-2.17	-6.09	-8.83	—
	w/ FI Correction	-1.85	-6.34	-5.85	1.02
	w/ LoRA Approximation	-2.17	-6.58	-9.30	-0.32
\rowcolorgray!10 \cellcolorwhite	w/ Both (VILA)	-2.17	-5.17	-9.30	0.15
IHL	FILA (Baseline)	-2.17	-5.40	-1.79	—
	w/ FI Correction	-1.54	-0.85	-0.47	2.17
	w/ LoRA Approximation	-2.17	-4.53	-0.19	0.82
\rowcolorgray!10 \cellcolorwhite	w/ Both (VILA)	-1.85	-1.17	-0.83	1.84

Table 5. Table 5: Validity of the Expectation as an Importance Score on Phi-1.5B . AVG Gain (↑) denotes average improvement in unlearning loss across splits.

Loss	Method	Forget 1%	Forget 5%	Forget 10%	AVG Gain ( $↑$ )
GD	FILA	-2.17	-10.23	-13.84	–
	ExpILA	-1.27	-10.54	-10.54	1.93
	AbsILA	-1.54	-10.23	-10.29	2.03
\rowcolorgray!10 \cellcolorwhite	VILA	-1.54	-9.61	-10.80	2.06
NPO	FILA	-2.17	-6.09	-8.83	–
	ExpILA	-1.85	-5.62	-10.54	2.74
	AbsILA	-1.85	-5.86	-9.06	1.23
\rowcolorgray!10 \cellcolorwhite	VILA	-1.85	-1.17	-0.83	7.68
IHL	FILA	-2.17	-5.40	-1.79	–
	ExpILA	-0.39	-3.37	-6.05	2.43
	AbsILA	-1.27	-3.55	-5.11	5.65
\rowcolorgray!10 \cellcolorwhite	VILA	-1.85	-1.17	-0.83	7.68

Table 6. Table 6: Average norm values of each term across all parameters. The gradients are computed separately for the forget and retain sets, resulting in distinct values for gradient-dependent terms. Note that B 0 A 0 B_{0}A_{0} does not depend on any gradients but is determined solely by the initialization, and thus yields the same value for both sets.

Term	Forget Set	Retain Set
$B_{0} A_{0}$	0.00568	0.00568
$Δ B A_{0}$	39.75	37.0
$B_{0} Δ A$	40.5	38.5
\rowcolor gray!10 $Δ B Δ A$	688128.0	585728.0

Table 7. Table 7: In-domain unlearning performance on the TOFU dataset. MU: Model Utility (higher is better), FQ: Forget Quality (lower is better).

Method	Llama2-7B MU $↑$	Llama2-7B FQ $↓$	Phi-1.5B MU $↑$	Phi-1.5B FQ $↓$
IHL (Baseline)	0.95	0.65	0.88	0.69
IHL + FILA	0.93	0.50	0.89	0.55
IHL + Ours	0.94	0.52	0.88	0.57

Table 8. Table 8: Validity of the Expectation as an Importance Score. Results using Llama2-7B model.

Loss	Method	Forget 1%	Forget 5%	Forget 10%	AVG Gain ( $↑$ )
GD	FILA	-3.30	-12.53	-17.27	–
	ExpILA	-2.90	-12.18	-6.84	2.64
	AbsILA	-2.90	-9.02	-9.06	2.95
\rowcolorgray!10 \cellcolorwhite	VILA	-2.17	-1.40	-1.18	8.36
NPO	FILA	-3.30	-11.18	-11.06	–
	ExpILA	-1.27	-12.18	-5.48	4.72
	AbsILA	-2.52	-12.18	-4.26	3.92
\rowcolorgray!10 \cellcolorwhite	VILA	-1.54	-4.32	-4.59	5.74
IHL	FILA	-3.30	-0.95	-0.47	–
	ExpILA	-1.27	-0.10	-0.34	7.27
	AbsILA	-0.78	-0.01	-0.23	7.50
\rowcolorgray!10 \cellcolorwhite	VILA	-1.27	-0.20	-0.40	7.22

Table 9. Table 9: Sensitivity of unlearning performance to LoRA initialization standard deviation ( σ \sigma ). Results show Forget Quality scores (lower is better) for Phi-1.5B model. “X” indicates unstable training or divergence. Best results per loss function are in bold.

Method	$σ = 0.01$	$σ = 0.05$	$σ = 0.10$	$σ = 0.20$	$σ = 0.30$	$σ = 0.40$	$σ = 0.50$
GD	$- 13.54$	$- 12.41$	$- 8.59$	$- 12.13$	$- 10.29$	X	X
NPO	X	$- 10.80$	$- 11.32$	$- 9.30$	$- 9.79$	X	X
IHL	X	$- 2.02$	$- 9.54$	$- 11.06$	$- 10.54$	X	X

Table 10. Table 10: Ablation study on the sensitivity of VILA to the extent of importance map application. Results show Forget Quality scores (lower is better). Best performance per method is shown in bold.

Method	0% (Baseline)	25% Layers	50% Layers	75% Layers	100% (VILA)
GD + VILA	-16.61	-0.23	-0.83	-0.47	-1.18
IHL + VILA	-7.70	-0.01	-0.03	-0.29	-0.40
NPO + VILA	-13.84	-3.94	-4.76	-5.29	-4.59

Equations62

θ min E_{(x, y) \in D_{f}} [L_{f} (y ∣ x; θ)] + λ E_{(x, y) \in D_{r}} [L_{r} (y ∣ x; θ)] .

θ min E_{(x, y) \in D_{f}} [L_{f} (y ∣ x; θ)] + λ E_{(x, y) \in D_{r}} [L_{r} (y ∣ x; θ)] .

F_{θ} (D) = E_{D} [(\frac{\partial}{\partial θ} lo g p_{θ} (D))^{2}] \approx \frac{1}{∣ D ∣} x \in D \sum (\frac{\partial}{\partial θ} L_{LM} (x; θ))^{2} .

F_{θ} (D) = E_{D} [(\frac{\partial}{\partial θ} lo g p_{θ} (D))^{2}] \approx \frac{1}{∣ D ∣} x \in D \sum (\frac{\partial}{\partial θ} L_{LM} (x; θ))^{2} .

M (D) = \frac{F _{θ} ( D _{f} )}{F _{θ} ( D _{r} )} .

M (D) = \frac{F _{θ} ( D _{f} )}{F _{θ} ( D _{r} )} .

B^{*}, A^{*} = ar g B, A min i, j \sum ([M]_{i, j} (W - B A)_{i, j})^{2} .

B^{*}, A^{*} = ar g B, A min i, j \sum ([M]_{i, j} (W - B A)_{i, j})^{2} .

Var_{D} [Δ W] := E_{D} [(\frac{\partial}{\partial W} lo g p_{W} (D))^{2}] - (E_{D} [\frac{\partial}{\partial W} lo g p_{W} (D)])^{2} .

Var_{D} [Δ W] := E_{D} [(\frac{\partial}{\partial W} lo g p_{W} (D))^{2}] - (E_{D} [\frac{\partial}{\partial W} lo g p_{W} (D)])^{2} .

Var_{D} [Δ W] \approx Var_{D} [Δ B] Var_{D} [Δ A] .

Var_{D} [Δ W] \approx Var_{D} [Δ B] Var_{D} [Δ A] .

Δ W = B A .

Δ W = B A .

Var_{D} [Δ W] \approx Var_{D} [Δ B] Var_{D} [Δ A] .

Var_{D} [Δ W] \approx Var_{D} [Δ B] Var_{D} [Δ A] .

M (D) = \frac{Var _{D_{f}} [ Δ W ]}{Var _{D_{r}} [ Δ W ]} \approx \frac{Var _{D_{f}} [ Δ B ] Var _{D_{f}} [ Δ A ]}{Var _{D_{r}} [ Δ B ] Var _{D_{r}} [ Δ A ]}

M (D) = \frac{Var _{D_{f}} [ Δ W ]}{Var _{D_{r}} [ Δ W ]} \approx \frac{Var _{D_{f}} [ Δ B ] Var _{D_{f}} [ Δ A ]}{Var _{D_{r}} [ Δ B ] Var _{D_{r}} [ Δ A ]}

Var_{D_{f}} [Δ B], Var_{D_{f}} [Δ A], Var_{D_{r}} [Δ B], Var_{D_{r}} [Δ A]

Var_{D_{f}} [Δ B], Var_{D_{f}} [Δ A], Var_{D_{r}} [Δ B], Var_{D_{r}} [Δ A]

M (D) = \frac{Var _{D_{f}} [ Δ B ] Var _{D_{f}} [ Δ A ]}{Var _{D_{r}} [ Δ B ] Var _{D_{r}} [ Δ A ]}

M (D) = \frac{Var _{D_{f}} [ Δ B ] Var _{D_{f}} [ Δ A ]}{Var _{D_{r}} [ Δ B ] Var _{D_{r}} [ Δ A ]}

B^{*}, A^{*} = ar g B, A min i, j \sum (M_{ij} (W - B A)_{ij})^{2}

B^{*}, A^{*} = ar g B, A min i, j \sum (M_{ij} (W - B A)_{ij})^{2}

B^{'}, A^{'} = ar g B^{*}, A^{*} min E_{(x, y) \in D_{f}} [L_{f} (y ∣ x; θ)] + λ E_{(x, y) \in D_{r}} [L_{r} (y ∣ x; θ)]

B^{'}, A^{'} = ar g B^{*}, A^{*} min E_{(x, y) \in D_{f}} [L_{f} (y ∣ x; θ)] + λ E_{(x, y) \in D_{r}} [L_{r} (y ∣ x; θ)]

M_{ExpILA} := \frac{E _{D_{f}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{f} ) ]}{E _{D_{r}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{r} ) ]}, M_{AbsILA} := \frac{E _{D_{f}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{f} ) ]}{E _{D_{r}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{r} ) ]}

M_{ExpILA} := \frac{E _{D_{f}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{f} ) ]}{E _{D_{r}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{r} ) ]}, M_{AbsILA} := \frac{E _{D_{f}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{f} ) ]}{E _{D_{r}} [ \frac{\partial}{\partial W} lo g p _{W} ( D _{r} ) ]}

Δ W = B A = (B_{0} + Δ B) (A_{0} + Δ A),

Δ W = B A = (B_{0} + Δ B) (A_{0} + Δ A),

Δ W \approx Δ B Δ A .

Δ W \approx Δ B Δ A .

Δ W_{ij} \approx k = 1 \sum r Δ B_{ik} Δ A_{k j} .

Δ W_{ij} \approx k = 1 \sum r Δ B_{ik} Δ A_{k j} .

Var_{D} (k = 1 \sum r Δ B_{ik} Δ A_{k j}) = k = 1 \sum r Var_{D} [Δ B_{ik} Δ A_{k j}] + k \neq = k^{'} \sum Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) .

Var_{D} (k = 1 \sum r Δ B_{ik} Δ A_{k j}) = k = 1 \sum r Var_{D} [Δ B_{ik} Δ A_{k j}] + k \neq = k^{'} \sum Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) .

Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) = E_{D} [Δ B_{ik} Δ A_{k j} Δ B_{i k^{'}} Δ A_{k^{'} j}] - E_{D} [Δ B_{ik} Δ A_{k j}] E_{D} [Δ B_{i k^{'}} Δ A_{k^{'} j}] .

Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) = E_{D} [Δ B_{ik} Δ A_{k j} Δ B_{i k^{'}} Δ A_{k^{'} j}] - E_{D} [Δ B_{ik} Δ A_{k j}] E_{D} [Δ B_{i k^{'}} Δ A_{k^{'} j}] .

Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) = 0.

Cov_{D} (Δ B_{ik} Δ A_{k j}, Δ B_{i k^{'}} Δ A_{k^{'} j}) = 0.

Var_{D} [Δ W_{ij}] = k = 1 \sum r Var_{D} [Δ B_{ik} Δ A_{k j}] .

Var_{D} [Δ W_{ij}] = k = 1 \sum r Var_{D} [Δ B_{ik} Δ A_{k j}] .

Var_{D} [Δ B_{ik} Δ A_{k j}] = E_{D} [Δ B_{ik}^{2}] E_{D} [Δ A_{k j}^{2}] - (E_{D} [Δ B_{ik}] E_{D} [Δ A_{k j}])^{2} .

Var_{D} [Δ B_{ik} Δ A_{k j}] = E_{D} [Δ B_{ik}^{2}] E_{D} [Δ A_{k j}^{2}] - (E_{D} [Δ B_{ik}] E_{D} [Δ A_{k j}])^{2} .

Var_{D} [Δ B_{ik} Δ A_{k j}] \approx Var_{D} [Δ B_{ik}] Var_{D} [Δ A_{k j}] .

Var_{D} [Δ B_{ik} Δ A_{k j}] \approx Var_{D} [Δ B_{ik}] Var_{D} [Δ A_{k j}] .

Var_{D} [Δ W_{ij}] \approx k = 1 \sum r Var_{D} [Δ B_{ik}] Var_{D} [Δ A_{k j}] .

Var_{D} [Δ W_{ij}] \approx k = 1 \sum r Var_{D} [Δ B_{ik}] Var_{D} [Δ A_{k j}] .

Cov_{D} (Δ B_{ik}, Δ A_{k j}) = 0 for all i, j, k .

Cov_{D} (Δ B_{ik}, Δ A_{k j}) = 0 for all i, j, k .

E_{D} [Δ B_{ik} Δ A_{k j} Δ B_{i k^{'}} Δ A_{k^{'} j}]

E_{D} [Δ B_{ik} Δ A_{k j} Δ B_{i k^{'}} Δ A_{k^{'} j}]

E_{D} [Δ B_{ik}] \cdot E_{D} [Δ A_{k j}] \cdot E_{D} [Δ B_{i k^{'}}] \cdot E_{D} [Δ A_{k^{'} j}] .

E_{D} [Δ B_{ik}] \cdot E_{D} [Δ A_{k j}] \cdot E_{D} [Δ B_{i k^{'}}] \cdot E_{D} [Δ A_{k^{'} j}] .

Var_{D} [Δ A_{k j}] = E_{D} [(Δ A_{k j})^{2}] - (E_{D} [Δ A_{k j}])^{2}, Var_{D} [Δ B_{ik}] = E_{D} [(Δ B_{ik})^{2}] - (E_{D} [Δ B_{ik}])^{2} .

Var_{D} [Δ A_{k j}] = E_{D} [(Δ A_{k j})^{2}] - (E_{D} [Δ A_{k j}])^{2}, Var_{D} [Δ B_{ik}] = E_{D} [(Δ B_{ik})^{2}] - (E_{D} [Δ B_{ik}])^{2} .

L_{GD} (θ) = - E_{(x, y) \sim D_{f}} [- lo g (p (y ∣ x; θ))] + E_{(x, y) \sim D_{r}} [- lo g (p (y ∣ x; θ))] .

L_{GD} (θ) = - E_{(x, y) \sim D_{f}} [- lo g (p (y ∣ x; θ))] + E_{(x, y) \sim D_{r}} [- lo g (p (y ∣ x; θ))] .

L_{NPO} (θ) = - \frac{2}{β} E_{(x, y) \sim D_{f}} [lo g σ (- β lo g \frac{p ( y ∣ x ; θ )}{p ( y ∣ x ; θ _{ref} )})] + E_{(x, y) \sim D_{r}} [- lo g (p (y ∣ x; θ))] .

L_{NPO} (θ) = - \frac{2}{β} E_{(x, y) \sim D_{f}} [lo g σ (- β lo g \frac{p ( y ∣ x ; θ )}{p ( y ∣ x ; θ _{ref} )})] + E_{(x, y) \sim D_{r}} [- lo g (p (y ∣ x; θ))] .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning

Full text

Improving Fisher Information Estimation and Efficiency

for LoRA-based LLM Unlearning

Yejin Kim

Sogang University

&Eunwon Kim11footnotemark: 1

Sogang University

&Buru Chang†

Korea University

&Junsuk Choe†

Sogang University Co-first authors. ${\dagger}$ Co-corresponding authors.

Abstract

LLMs have demonstrated remarkable performance across various tasks but face challenges related to unintentionally generating outputs containing sensitive information. A straightforward approach to address this issue is to retrain the model after excluding the problematic data. However, this approach incurs prohibitively high computational costs. To overcome this limitation, machine unlearning has emerged as a promising solution that can effectively remove sensitive information without the need to retrain the model from scratch. Recently, FILA has been proposed as a parameter-efficient unlearning method by integrating LoRA adapters. Specifically, it calculates the Fisher information to identify parameters associated with the forget set and assigns them to LoRA adapters for updates. Despite its innovative approach, FILA still requires access to all model parameters and does not adequately account for fundamental assumptions underlying Fisher information, leading to inaccuracies in importance estimation. To address these limitations, we propose VILA, a novel unlearning framework that explicitly considers the assumptions overlooked in FILA, thereby enhancing the accuracy of parameter identification for the forget set. Moreover, VILA significantly reduces computational costs by enabling parameter identification without accessing the entire model. Our method achieves up to 100× higher parameter efficiency and 40× faster training speed compared to FILA, and sets new state-of-the-art performance on benchmarks including TOFU, WMDP, and MUSE. Our code is available at https://github.com/kyj93790/VILA.

1 Introduction

Large Language Models (LLMs) are driving remarkable progress across a wide range of applications. However, they also exhibit a critical risk: the tendency to memorize and regenerate sensitive personal information or copyrighted content from their training data. For instance, Brown et al. (2022) have shown that LLMs often output personal identifiers such as email addresses and phone numbers from the training corpus. Similarly, LLMs are known to reproduce copyrighted materials, such as passages from Harry Potter, with high fidelity (Eldan & Russinovich, 2023). These issues raise serious concerns about privacy violations and intellectual property infringement. As a result, there is growing demand for methods that can effectively remove sensitive or proprietary information from LLMs.

The most straightforward way to remove specific information from a model is to retrain it from scratch without the corresponding data (i.e., exact unlearning). However, given the massive size of LLMs and their extensive training corpora, this approach is computationally expensive and time-consuming. To address this challenge, recent research has focused on methods that aim to eliminate the information to be forgotten without full retraining (i.e., approximate unlearning). For example, loss-based techniques such as Gradient Ascent (GA) (Jang et al., 2023), Negative Preference Optimization (NPO) (Zhang et al., 2024), and Inverted Hinge Loss (IHL) (Cha et al., 2025) have been proposed to reduce the likelihood of generating specific content through fine-tuning.

Nevertheless, directly updating billions of parameters remains computationally demanding, even when applying approximate unlearning techniques. To alleviate this burden, Fisher-Initialization of Low-rank Adapters (FILA) (Cha et al., 2025) has been introduced. FILA leverages Fisher information (Fisher, 1922) to estimate gradient variance and identify parameters most closely related to the data to be forgotten. These parameters are isolated from the base model by assigning them to a LoRA adapter (Hu et al., 2022). Unlearning is then performed exclusively on the adapters. This enables parameter-efficient unlearning while minimizing the impact on the retained knowledge.

However, our analysis reveals two critical limitations of FILA. First, the Fisher information used by FILA does not accurately represent parameter importance in the machine unlearning setting. For Fisher information to indicate importance, the distribution of the forget set must match that of the full dataset. However, the forget set typically constitutes only a small fraction of the entire dataset, inevitably leading to a statistical discrepancy between the forget set and the full dataset. FILA overlooks this discrepancy, which results in a forget importance map that inaccurately captures the association between the forget set and the parameters. Moreover, although FILA is designed for parameter-efficient unlearning, it still requires computing full gradients for all model parameters to construct the importance map. This significantly undermines its computational efficiency. Our analysis shows that the cost of FILA grows rapidly with the size of the forget set. When forgetting 10% of the dataset, the initialization time exceeds that of full model retraining—highlighting a serious limitation in scalability (refer to Section 4.2).

Building on the above analysis, we propose a precise and scalable approach, Variance-based Importance estimation and efficient Low-rank Adaptation (VILA). Our method improves the estimation of parameter importance by explicitly considering the distributional shift of the forget set. Furthermore, we construct the forget importance map solely using the gradients from the LoRA adapters, resulting in up to a 40× speedup and approximately 100× reduction in memory consumption compared to FILA as the size of the forget set increases.

We evaluate our method on multiple LLMs, including Phi-1.5B (Li et al., 2023), Llama2-7B (Touvron et al., 2023), Zephyr-7B (Tunstall et al., 2024) and ICLM-7B (Shi et al., 2024), in combination with existing unlearning loss functions such as GA, NPO, and IHL. Experimental results on the TOFU (Maini et al., 2024), MUSE Books (Shi et al., 2025), WMDP Bio and WMDP Cyber (Li et al., 2024) benchmarks demonstrate that our method not only improves resource efficiency but also sets a new state-of-the-art in unlearning performance.

2 Related Work

LLM Unlearning aims to eliminate the influence of specific data from large language models without incurring the cost of expensive retraining. This approach addresses various challenges, such as preserving privacy, resolving copyright issues, and removing hazardous knowledge (Brown et al., 2022; Eldan & Russinovich, 2023; Li et al., 2024).

Several studies mainly focus on modifying the loss function to induce unlearning. A representative example is Gradient Ascent (GA), which increases the loss on the forget data in order to reduce the model’s predictive accuracy on that data (Jang et al., 2023). The limitation of GA is that it can easily degrade performance on retain data (Maini et al., 2024). To address this issue, Gradient Difference (GD) has been introduced, performing gradient ascent on forget data to eliminate their influence while applying gradient descent on retain data to preserve the model’s generalization ability (Liu et al., 2022a). Also, Negative Preference Optimization (NPO) (Zhang et al., 2024) has been proposed, building on the LLM alignment approach (Rafailov et al., 2024). By reweighting gradients during the learning process, NPO addresses the issue of excessive unlearning commonly caused by GA, significantly improving the stability of the unlearning process. Most recently, Inverted Hinge Loss (IHL) (Cha et al., 2025) promotes unlearning by decreasing the probability of the forget token while increasing the probability of the highest-probability alternative token, excluding the forget token itself.

Beyond loss function-based methods, various approaches have been proposed. Task Arithmetic (Ilharco et al., 2023) defines the difference between the fine-tuned model only on the forget set and the original model as a task vector, which is then negated from the original model to induce forgetting. This approach, known as Forgetting via Negation, has been shown to be effective in making LLMs unlearn harmful language generation or fail at performing specific tasks. ULD (Ji et al., 2024) utilizes an auxiliary LLM to achieve the unlearning objective during the decoding process of an LLM. The auxiliary LLM is trained to actively memorize the forget set while simultaneously forgetting the retain set. The unlearned LLM is generated by calculating the logit difference between the auxiliary LLM and the original model, thereby effectively achieving the unlearning objective. FILA (Cha et al., 2025) employs LoRA adapters (Lermen & Rogers-Smith, 2024) to improve the computational efficiency of LLM unlearning. To achieve this, FILA identifies parameters associated with the forget set and initializes the LoRA adapters to be strongly correlated with the forget set, while the base layer is initialized to be closely related to the retain set. Subsequently, the LoRA adapters are fine-tuned using unlearning loss functions. FILA is the most related work to our study, as we also focus on achieving parameter-efficient unlearning.

3 Preliminaries

3.1 Problem Definition

The goal of unlearning is to effectively eliminate the knowledge associated with a specified forget set $\mathcal{D}_{f}$ in the LLM, without retraining the model from scratch. At the same time, the model is expected to preserve its performance on a retain set $\mathcal{D}_{r}$ , which contains knowledge that must be maintained. This objective can be formulated as an optimization problem as:

[TABLE]

In this formulation, $\mathcal{L}_{f}$ denotes the loss function applied to the forget set $\mathcal{D}_{f}$ , encouraging the model to remove the corresponding knowledge. On the other hand, $\mathcal{L}_{r}$ is the loss function applied to the retain set $\mathcal{D}_{r}$ , which ensures that essential knowledge is preserved. The model parameters are represented by $\theta$ , which are updated during the unlearning process. The hyperparameter $\lambda$ controls the strength of the retention loss term, effectively regulating how strongly the model is penalized for deviating from the retain set.

3.2 FILA: Fisher-Initialization of Low-rank Adapters

FILA achieves parameter-efficient unlearning by employing LoRA to identify parameters critical to the forget set and focuses updates on these parameters during unlearning. The overall procedure is as follows.

Low-rank Adaptation (LoRA). LoRA approximates the parameter update $\Delta W$ of a model’s base weight matrix $W$ by training an adapter composed of two low-rank matrices, $B$ and $A$ , such that $\Delta W=BA$ . The adapter is then added to $W$ to produce the final model. Since $B$ and $A$ contains far fewer parameters than $W$ , this approach enables efficient fine-tuning of LLMs with substantially fewer computational cost.

Forget Importance Map Extraction. FILA employs Fisher information (FI) to identify parameters associated with the forget set. The FI of a dataset $\mathcal{D}$ with respect to model parameters $\theta$ is defined as:

[TABLE]

$\mathcal{L}_{\text{LM}}$ denotes the next-token prediction loss used in the pre-trained language model. The FI measures the variance of the score function, which is the gradient of the log-likelihood with respect to the model parameters. Intuitively, it captures how sensitively the model output changes in response to perturbations in each parameter. A higher FI value indicates that the parameter plays a more critical role in modeling the dataset $\mathcal{D}$ .

Based on this interpretation, FILA computes the ratio of FI values obtained from the forget set and retain set to determine how important each parameter is with respect to the forget set. This ratio is referred to as the forget importance map, denoted as $\mathcal{M}(\mathcal{D})$ :

[TABLE]

The computed $\mathcal{M}(\mathcal{D})$ plays a critical role in assigning weights to important parameter in LoRA-based efficient unlearning.

LoRA Initialization with Forget Importance Map. FILA modifies the initialization of both the base layer and the LoRA adapter in a way that is suitable for unlearning by leveraging the forget importance map. First, FILA formulates the following Weighted Low-Rank Approximation (WLRA) objective to obtain $B^{*}$ and $A^{*}$ :

[TABLE]

Since the forget importance map $\mathcal{M}$ assigns larger weights in WLRA to parameters more relevant to the forget set, the resulting product $B^{*}A^{*}$ captures the components of original weight matrix $W$ that have significant influence on the forget set. Based on this, we initialize the LoRA matrices B and A with $B^{*}$ and $A^{*}$ , respectively, so that the adapter focuses on forget-set-related parameters. FILA then obtains $W^{*}$ by subtracting $B^{*}A^{*}$ from $W$ , using it as the new base layer. Since $B^{*}A^{*}$ concentrates information specific to the forget set, the subtraction $W^{*}=W-B^{*}A^{*}$ removes forget-set-related parameters while preserving those relevant to the retain set.

Through this initialization of both the adapter and the base layer, the overall model parameters remain unchanged, as $W=(W-B^{*}A^{*})+B^{*}A^{*}$ . However, the information associated with the forget and retain sets becomes cleanly disentangled.

Parameter-efficient Unlearning. After initializing LoRA, FILA freezes the base layer and updates only the LoRA adapter parameters using an unlearning loss. Since parameters crucial for the forget set are allocated to the trainable adapter, while those important to the retain set remain in the frozen base layer, the model can effectively erase the undesired information while preserving essential knowledge. The final unlearned model is obtained by merging the updated adapter—now purged of forget set information—with the base layer.

4 VILA: The Proposed Method

4.1 Corrected Parameter Importance Estimation

We argue that the forget importance map calculated by FILA is inaccurate. FILA estimates the Fisher information (FI) of each parameter with respect to the forget set and the retain set, then interprets FI as a variance measure to derive the forget importance map based on the ratio of these FIs. However, this approach overlooks a critical assumption required to interpret FI as a variance: the expectation of the score function (i.e., the gradient) must be zero (Fisher, 1922). This condition holds only when the distribution of the forget set matches the distribution of the entire training data. In machine unlearning tasks, however, the forget set is typically a subset of data that has been intentionally selected for removal, making its distribution inherently different from that of the entire dataset. As a result, the score function has a non-zero expectation, violating the necessary assumption. To reliably identify parameters strongly associated with the forget set, it is essential to account for distributional discrepancies that arise in unlearning scenarios.

To address this issue, we correct the FI (Equation 2) by explicitly subtracting the squared expectation of the score function from the original formulation. This is equivalent to the variance of the parameter $\Delta W$ to the dataset $\mathcal{D}$ :

[TABLE]

We regard this modified quantity as an adjusted importance score for the dataset. Experimental results demonstrate that this modification significantly improves unlearning performance. While the solution is simple, identifying and correctly addressing this overlooked aspect in existing work constitutes one of the key contributions of this paper.

4.2 Improving Efficiency via Low-rank Approximation

Despite its intended goal, FILA is not computationally efficient. While FILA aims to perform parameter-efficient unlearning by adopting LoRA, it still requires access to the entire set of LLM parameters to compute the forget importance map. As a result, the importance map calculation remains computationally expensive.

We report the computational time required for model retraining, importance map extraction using FILA, and model unlearning in Table 1, empirically demonstrating these inefficiencies. Notably, the extraction of the forget importance map, intended as a preprocessing step, incurs even greater computational cost than the unlearning process itself. This inefficiency becomes especially pronounced when the forget set constitutes approximately 10% of the training data, where importance map computation exceeds the time required for retraining. These results suggest that FILA does not scale well with forget set size, making it suboptimal in terms of efficiency. Thus, achieving truly efficient unlearning necessitates a more efficient approach to extracting the forget importance map.

To address this issue, we propose an approach that utilizes the gradients of LoRA adapter rather than those of the entire model. First, we initialize the LoRA adapter matrices $B$ and $A$ independently, following a Gaussian distribution with a mean of zero. Next, we add the adapter $BA$ to the original model parameter $W$ and compute the gradients of $B$ and $A$ for a given input data $\mathcal{D}$ . Using these gradients, we calculate $\mathrm{Var}_{\mathcal{D}}[\Delta B]$ and $\mathrm{Var}_{\mathcal{D}}[\Delta A]$ , respectively. We then multiply these two values to obtain the variance of the model parameters ${W}$ :

[TABLE]

One critical aspect of our approach is understanding how the variance of the gradient of the model parameter can be approximated with those of LoRA adapters. To explore this, we present the following theorem:

Theorem 1 (Variance Approximation of LoRA Parameter Updates).

Let $\mathcal{D}$ be the input data, $W$ be the model parameter matrix and let $\Delta W$ denote its update. In the LoRA framework, the parameter update $\Delta W$ is represented as the product of two low-rank matrices $B$ and $A$ such that:

[TABLE]

Assuming that both $B$ and $A$ are independently initialized from zero-mean Gaussian distributions, the variance of each element $\Delta W_{ij}$ can be approximated as:

[TABLE]

Proof.

The proof is provided in the Appendix A. ∎

Finally, we derive the forget importance map as the element-wise ratio of the importance values calculated for the forget set and the retain set:

[TABLE]

In this way, we efficiently compute the forget importance map without directly accessing the entire parameter set of the LLM. The pseudo code of unlearning process is in Algorithm 1.

5 Experiments

Benchmarks and compared methods. We evaluate unlearning performance using three benchmarks: TOFU (Maini et al., 2024), WMDP (Li et al., 2024), and MUSE (Shi et al., 2025), and primarily compare our method against FILA with three unlearning loss functions: GD (Liu et al., 2022b), NPO (Zhang et al., 2024), and IHL (Cha et al., 2025). Further details on benchmarks and compared methods are provided in Appendix H and I, respectively.

Implementation details. All experiments are conducted using two NVIDIA A6000 GPUs with 48GB of memory. The batch size is set to 32 for TOFU and MUSE, and 4 for WMDP. The LoRA rank is set to 8 for TOFU and WMDP, and 16 for MUSE. Weight decay is configured as 0.01 for TOFU and set to 0 for both MUSE and WMDP. We employ a linear learning rate scheduler for WMDP and TOFU, and a constant scheduler for MUSE.

Fair and comprehensive experimental designs. To ensure a fair comparison, we conduct the same number of hyperparameter searches for all compared methods. Specifically, we perform random search (Bergstra & Bengio, 2012) within a predefined hyperparameter range for each benchmark. We set the maximum unlearning epoch based on retraining cost considerations. Furthermore, to avoid evaluating models with significantly degraded utility, we select models that maintain at least 95% of the original model utility (Ilharco et al., 2023) while achieving the highest forgetting score. As the compared methods demonstrate comparable model utility, we report only the forgetting performance in the following tables. Additional details are provided in Appendix J.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bergstra & Bengio (2012) James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The journal of machine learning research , 13(1):281–305, 2012.
2Brown et al. (2022) Hannah Brown, Katherine Lee, Fatemehsadat Mireshghallah, Reza Shokri, and Florian Tramèr. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , F Acc T ’22, pp. 2280–2292, New York, NY, USA, 2022. Association for Computing Machinery. ISBN 9781450393522. doi: 10.1145/3531146.3534642 . URL https://doi.org/10.1145/3531146.3534642 . · doi ↗
3Cha et al. (2025) Sungmin Cha, Sungjun Cho, Dasol Hwang, and Moontae Lee. Towards robust and parameter-efficient knowledge unlearning for llms. In International Conference on Learning Representations , 2025.
4Eldan & Russinovich (2023) Ronen Eldan and Mark Russinovich. Who’s harry potter? approximate unlearning in llms, 2023.
5Fisher (1922) Ronald A Fisher. On the mathematical foundations of theoretical statistics. Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character , 222(594-604):309–368, 1922.
6Hendrycks et al. (2021) Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR) , 2021.
7Hu et al. (2022) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lo RA: Low-rank adaptation of large language models. In International Conference on Learning Representations , 2022. URL https://openreview.net/forum?id=n Ze V Kee F Yf 9 .
8Ilharco et al. (2023) Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Wortsman, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi. Editing models with task arithmetic. In The International Conference on Learning Representations , 2023. URL https://openreview.net/forum?id=6t 0Kwf 8-jrj .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Improving Fisher Information Estimation and Efficiency

Abstract

1 Introduction

2 Related Work

3 Preliminaries

3.1 Problem Definition

3.2 FILA: Fisher-Initialization of Low-rank Adapters

4 VILA: The Proposed Method

4.1 Corrected Parameter Importance Estimation

4.2 Improving Efficiency via Low-rank Approximation

Theorem 1** (Variance Approximation of LoRA Parameter Updates).**

Proof.

5 Experiments

Theorem 1 (Variance Approximation of LoRA Parameter Updates).