Ethos: Rectifying Language Models in Orthogonal Parameter Space

Lei Gao; Yue Niu; Tingting Tang; Salman Avestimehr; Murali Annavaram

arXiv:2403.08994·cs.CL·April 2, 2024·1 cites

Ethos: Rectifying Language Models in Orthogonal Parameter Space

Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram

PDF

Open Access 1 Video

TL;DR

Ethos is a novel method that rectifies language models by identifying and negating undesired knowledge in orthogonal parameter space, effectively reducing bias, toxicity, and memorization without harming overall performance.

Contribution

Ethos introduces a new task arithmetic approach that distinguishes beneficial from undesired knowledge in language models using principal components, improving bias and toxicity mitigation.

Findings

01

More effective in removing bias and toxicity.

02

Maintains overall model performance.

03

Applicable to debiasing, detoxification, and memorization unlearning.

Abstract

Language models (LMs) have greatly propelled the research on natural language processing. However, LMs also raise concerns regarding the generation of biased or toxic content and the potential disclosure of private information from the training dataset. In this work, we present a new efficient approach, Ethos, that rectifies LMs to mitigate toxicity and bias in outputs and avoid privacy leakage. Ethos is built on task arithmetic. However, unlike current task arithmetic algorithms, Ethos distinguishes general beneficial and undesired knowledge when reconstructing task vectors. Specifically, Ethos first obtains a set of principal components from the pre-trained models using singular value decomposition. Then, by projecting the task vector onto principal components, Ethos identifies the principal components that encode general or undesired knowledge. Ethos performs negating using the task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ethos: Rectifying Language Models in Orthogonal Parameter Space· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training