Paying Alignment Tax with Contrastive Learning

Buse Sibel Korkmaz; Rahul Nair; Elizabeth M. Daly; Antonio del Rio Chanona

arXiv:2505.19327·cs.LG·May 27, 2025

Paying Alignment Tax with Contrastive Learning

Buse Sibel Korkmaz, Rahul Nair, Elizabeth M. Daly, Antonio del Rio Chanona

PDF

Open Access

TL;DR

This paper introduces a contrastive learning framework that effectively reduces bias and toxicity in language models while maintaining factual accuracy and knowledge, overcoming trade-offs faced by existing methods.

Contribution

The paper presents a novel contrastive learning approach with dynamic loss scaling that improves bias mitigation and faithfulness preservation simultaneously.

Findings

01

Significant reduction in toxicity across multiple benchmarks.

02

Enhanced faithfulness and knowledge retention in models.

03

First method to improve bias and accuracy concurrently.

Abstract

Current debiasing approaches often result a degradation in model capabilities such as factual accuracy and knowledge retention. Through systematic evaluation across multiple benchmarks, we demonstrate that existing debiasing methods face fundamental trade-offs, particularly in smaller models, leading to reduced truthfulness, knowledge loss, or unintelligible outputs. To address these limitations, we propose a contrastive learning framework that learns through carefully constructed positive and negative examples. Our approach introduces contrast computation and dynamic loss scaling to balance bias mitigation with faithfulness preservation. Experimental results across multiple model scales demonstrate that our method achieves substantial improvements in both toxicity reduction and faithfulness preservation. Most importantly, we show that our framework is the first to consistently improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFinancial Literacy, Pension, Retirement Analysis · Fiscal Policy and Economic Growth

MethodsContrastive Learning