Layer-Aware Task Arithmetic: Disentangling Task-Specific and   Instruction-Following Knowledge

Yan-Lun Chen; Yi-Ru Wei; Chia-Yi Hsu; Chia-Mu Yu; Chun-Ying Huang,; Ying-Dar Lin; Yu-Sung Wu; Wei-Bin Lee

arXiv:2502.20186·cs.CL·February 28, 2025

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge

Yan-Lun Chen, Yi-Ru Wei, Chia-Yi Hsu, Chia-Mu Yu, Chun-Ying Huang,, Ying-Dar Lin, Yu-Sung Wu, Wei-Bin Lee

PDF

Open Access 1 Video

TL;DR

Layer-Aware Task Arithmetic (LATA) improves multi-task learning and task forgetting in large language models by assigning layer-specific weights to disentangle task-specific knowledge from instruction-following behavior, enhancing performance and model utility.

Contribution

The paper introduces LATA, a novel layer-wise weighting method that better isolates task-specific knowledge from instruction-following components in LLMs.

Findings

01

LATA outperforms existing methods in multi-task learning accuracy.

02

LATA achieves superior task forgetting with minimal output quality degradation.

03

Layer-wise analysis effectively disentangles task-specific and instruction-following knowledge.

Abstract

Large language models (LLMs) demonstrate strong task-specific capabilities through fine-tuning, but merging multiple fine-tuned models often leads to degraded performance due to overlapping instruction-following components. Task Arithmetic (TA), which combines task vectors derived from fine-tuning, enables multi-task learning and task forgetting but struggles to isolate task-specific knowledge from general instruction-following behavior. To address this, we propose Layer-Aware Task Arithmetic (LATA), a novel approach that assigns layer-specific weights to task vectors based on their alignment with instruction-following or task-specific components. By amplifying task-relevant layers and attenuating instruction-following layers, LATA improves task learning and forgetting performance while preserving overall model utility. Experiments on multiple benchmarks, including WikiText-2, GSM8K,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning