Distilling Linearized Behavior for Effective Task Arithmetic
Thomas Sommariva, Francesca Morandi, Simone Calderara, Angelo Porrello

TL;DR
This paper introduces a method to distill linearized behavior into non-linear models, enabling effective task vector composition for model editing without additional inference costs.
Contribution
It proposes a novel distillation approach that enforces linearity properties during training, bridging linear and non-linear fine-tuning for better task arithmetic.
Findings
Models inherit linearized task composition properties.
Achieves strong performance on vision and language benchmarks.
No inference-time overhead introduced.
Abstract
Task vector composition has emerged as a promising paradigm for editing pre-trained models, enabling model merging through addition and unlearning through subtraction. Fine-tuning in the tangent space of a pre-trained model (linear fine-tuning) has proven effective, as it produces task vectors that are naturally disentangled and resistant to interference. However, linearized models suffer from limited expressivity during training and incur higher computational costs at inference time, which restrict their practical applicability. In this work, we bridge the gap between linear and standard non-linear fine-tuning. We show that linearity with respect to weight perturbations, a property defined in parameter space, can be enforced through constraints in activation space during training. Concretely, we distill hidden representations from a curvature-regularized linearized teacher into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
