TL;DR
This paper systematically evaluates the fairness of task arithmetic in model editing, demonstrating its potential to achieve competitive accuracy while reducing bias disparities compared to traditional fine-tuning methods.
Contribution
It introduces the first comprehensive analysis of group fairness in task arithmetic, compares it with fine-tuning and LoRA, and provides a theoretical bound linking task vector scaling to fairness metrics.
Findings
Task vectors can be tuned for fairness without sacrificing accuracy.
Merging subgroup-specific task vectors improves fairness outcomes.
A theoretical bound relates task vector scaling to fairness metrics.
Abstract
Model editing techniques, particularly task arithmetic with task vectors, offer an efficient alternative to full fine-tuning by enabling direct parameter updates through simple arithmetic operations. While this approach promises substantial computational savings, its impact on fairness has remained largely unexplored -- despite growing concern over biased outcomes in high-stakes applications such as hate speech detection. In this work, we present the first systematic study of group fairness in task arithmetic within this binary text and image classification regime, comparing it against full fine-tuning (FFT) and Low-Rank Adaptation (LoRA). We evaluate across multiple language models and datasets using standard group fairness metrics, including Demographic Parity and Equalized Odds. Our analysis shows that task vectors can be tuned to achieve competitive accuracy while reducing…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper presents the first systematic comparative analysis of fairness in task arithmetic, filling a critical gap in understanding the societal implications of this efficient model editing paradigm. 2.The scaling coefficient technique is demonstrated to be theoretically grounded and empirically viable.
1. The core methodological approach relies on manually tuning a unified scalar coefficient, to balance accuracy and fairness. However, for complex real-world scenarios involving multidimensional fairness constraints—such as simultaneous optimization across gender, race, and age—this single-scalar control mechanism appears overly simplistic and lacks scalability. Manually identifying the optimal $\lambda$ configuration for every possible subgroup combination is inefficient and impractical. 2. Th
1. It presents a comprehensive evaluation comparing full fine-tuning (FFT), Low-Rank Adaptation (LoRA), task-vector editing, and a hybrid approach that injects task vectors into FFT, systematically analyzing their effects on fairness metrics and predictive performance. 2. It demonstrates that fairness can be achieved through task-vector scaling, showing that adjusting scaling coefficients effectively improves fairness while maintaining model accuracy. 3. It integrates task vectors from underrepr
1. The experimental scope is limited to binary classification on two related tasks (hate speech/toxicity detection) with specific demographic annotations, raising questions about generalizability to other NLP tasks. 2. The theoretical bound in Proposition 1 only addresses DPD, leaving EOD, which is equally emphasized empirically, without theoretical grounding. 3. The related work section does not discuss several recent and relevant fairness studies.
1. Analyze the impact of task vector on fairness is interesting and seems novel. 2. Well-structured paper and easy to understand
1. The model used in this paper (Llama2-7b, DistilBERT, Qwen2.5-0.5B) seems out of date 2. The informal theory is a little hard to understand. 3. The observation in 5.3 and 5.4 seems to be useful, but we need a more general or interesting conclusion in 5.3, which is a section called empirical results overview. 4. As a benchmark, it should be more comprehensive, such as include more baseline, dataset, base model.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
