TL;DR
This paper introduces a novel, data-free regularization method based on Kronecker-Factored Approximate Curvature to improve task vector disentanglement in task arithmetic, enhancing modularity and robustness without external data.
Contribution
It proposes a dataless regularization technique using curvature matrix approximation, achieving state-of-the-art results and constant complexity in task addition.
Findings
Achieves state-of-the-art results in task addition and negation.
Eliminates the need for held-out tuning.
Promotes robustness to task vector rescaling.
Abstract
Task Arithmetic yields a modular, scalable way to adapt foundation models. Combining multiple task vectors, however, can lead to cross-task interference, causing representation drift and degraded performance. Representation drift regularization provides a natural remedy to disentangle task vectors; however, existing approaches typically require external task data, conflicting with modularity and data availability constraints (e.g., privacy requirements). We propose a dataless approach by framing regularization against representation drift as a curvature matrix approximation problem. This allows us to leverage well-established techniques; in particular, we adopt Kronecker-Factored Approximate Curvature and obtain a practical regularizer that achieves state-of-the-art results in task addition and negation. Our method has constant complexity in the number of tasks and promotes robustness…
Peer Reviews
Decision·ICLR 2026 Poster
- The theoretical derivation is elegant, connecting representation drift in a linearized model to a practical, data-free regularizer via the GGN and KFAC. - In the linearized fine-tuning setting, the proposed method performs on-par with or better than the data-dependent $\tau$Jp baseline. - KFAC regularization at training time allows simple weight averaging (task arithmetic) to outperform more complex, SOTA post-hoc merging methods (like TSV) that are applied to unregularized vectors if training
- The entire experimental validation is confined to the linearized finetuning framework (see, e.g., Ortiz-Jimenez et al., 2023). The authors don't test their KFAC regularizer on full, non-linear finetuning. Is the regularizer only effective in a linearized regime? - The paper's validation against SOTA merging methods (TIES, TSV, etc.) is limited to this niche linearized FT setting (Figure 4). This avoids the most practical question: how do these methods perform on task vectors from standard, non
• Conceptual clarity: The drift penalty is cleanly derived from the linearization, tying representation drift to the GGN and enabling reuse of mature curvature approximations (Sec. 3.1–3.3). • Scalability: The merged KFAC surrogate (Eq. 8) achieves O(1) cost in tasks and empirically matches the naïve O(T) sum. Table 3 shows near-parity on ViT-B/16 and T5. • Strong empirical results: On 8-Vision, KFAC-regularized TA outperforms linear and non-linear FT, matches or beats τ-JP in several settin
• Missing baseline: No comparison to Task-Localized Sparse FT; given the shared goal (localized updates with low interference), this absence limits claims of state-of-the-art effectiveness on task-local editing. • Attention-only baseline not fully aligned: While “Non-linear (Attn.)” appears, the paper doesn’t replicate the full protocol and metrics of Fine-Tuning Attention Modules Only, so the reader can’t conclude whether KFAC-regularized FT beats that specific method under its strengths (e.g.
1. Quality: Good. The paper presents a well-motivated and technically sound contribution, with strong theoretical grounding in second-order optimization and empirical evidence supporting its claims. Experimental evaluation is relatively comprehensive, covering diverse architectures and tasks, with consistent metrics and clear ablation studies isolating the contribution of curvature-based disentanglement. 2. Clarity: The paper is very well written and easy to read. 3. Significance: The proposed
1. Minor Novelty concern because this paper seems relatively incremental as it’s built upon the key ideas of Ortiz-Jimenez et al., 2023 and Yoshida et al. (2025).
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing
