Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Angelo Porrello; Pietro Buzzega; Felix Dangel; Thomas Sommariva; Riccardo Salami; Lorenzo Bonicelli; Simone Calderara

arXiv:2602.17385·cs.AI·May 22, 2026

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature

Angelo Porrello, Pietro Buzzega, Felix Dangel, Thomas Sommariva, Riccardo Salami, Lorenzo Bonicelli, Simone Calderara

PDF

1 Models 1 Video 3 Reviews

TL;DR

This paper introduces a novel, data-free regularization method based on Kronecker-Factored Approximate Curvature to improve task vector disentanglement in task arithmetic, enhancing modularity and robustness without external data.

Contribution

It proposes a dataless regularization technique using curvature matrix approximation, achieving state-of-the-art results and constant complexity in task addition.

Findings

01

Achieves state-of-the-art results in task addition and negation.

02

Eliminates the need for held-out tuning.

03

Promotes robustness to task vector rescaling.

Abstract

Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 4

Strengths

- The theoretical derivation is elegant, connecting representation drift in a linearized model to a practical, data-free regularizer via the GGN and KFAC. - In the linearized fine-tuning setting, the proposed method performs on-par with or better than the data-dependent $\tau$Jp baseline. - KFAC regularization at training time allows simple weight averaging (task arithmetic) to outperform more complex, SOTA post-hoc merging methods (like TSV) that are applied to unregularized vectors if training

Weaknesses

- The entire experimental validation is confined to the linearized finetuning framework (see, e.g., Ortiz-Jimenez et al., 2023). The authors don't test their KFAC regularizer on full, non-linear finetuning. Is the regularizer only effective in a linearized regime? - The paper's validation against SOTA merging methods (TIES, TSV, etc.) is limited to this niche linearized FT setting (Figure 4). This avoids the most practical question: how do these methods perform on task vectors from standard, non

Reviewer 02Rating 4Confidence 4

Strengths

• Conceptual clarity: The drift penalty is cleanly derived from the linearization, tying representation drift to the GGN and enabling reuse of mature curvature approximations (Sec. 3.1–3.3). • Scalability: The merged KFAC surrogate (Eq. 8) achieves O(1) cost in tasks and empirically matches the naïve O(T) sum. Table 3 shows near-parity on ViT-B/16 and T5. • Strong empirical results: On 8-Vision, KFAC-regularized TA outperforms linear and non-linear FT, matches or beats τ-JP in several settin

Weaknesses

• Missing baseline: No comparison to Task-Localized Sparse FT; given the shared goal (localized updates with low interference), this absence limits claims of state-of-the-art effectiveness on task-local editing. • Attention-only baseline not fully aligned: While “Non-linear (Attn.)” appears, the paper doesn’t replicate the full protocol and metrics of Fine-Tuning Attention Modules Only, so the reader can’t conclude whether KFAC-regularized FT beats that specific method under its strengths (e.g.

Reviewer 03Rating 6Confidence 4

Strengths

1. Quality: Good. The paper presents a well-motivated and technically sound contribution, with strong theoretical grounding in second-order optimization and empirical evidence supporting its claims. Experimental evaluation is relatively comprehensive, covering diverse architectures and tasks, with consistent metrics and clear ablation studies isolating the contribution of curvature-based disentanglement. 2. Clarity: The paper is very well written and easy to read. 3. Significance: The proposed

Weaknesses

1. Minor Novelty concern because this paper seems relatively incremental as it’s built upon the key ideas of Ortiz-Jimenez et al., 2023 and Yoshida et al. (2025).

Code & Models

Models

🤗
aimagelab-ta/TAK
model· ♡ 1
♡ 1

Videos

Dataless Weight Disentanglement in Task Arithmetic via Kronecker-Factored Approximate Curvature· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing