Collapsed Language Models Promote Fairness

Jingxuan Xu; Wuyang Chen; Linyi Li; Yao Zhao; Yunchao Wei

arXiv:2410.04472·cs.CL·January 30, 2025

Collapsed Language Models Promote Fairness

Jingxuan Xu, Wuyang Chen, Linyi Li, Yao Zhao, Yunchao Wei

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates the phenomenon of Neural Collapse in language models and leverages it to develop a principled fine-tuning method that enhances fairness without sacrificing task performance.

Contribution

It introduces a novel understanding of fairness-related biases through Neural Collapse and proposes a new fine-tuning approach to improve fairness across various debiasing techniques.

Findings

01

Debiased models show collapsed alignment in last-layer representations.

02

The proposed method improves fairness across multiple debiasing approaches.

03

Fairness is enhanced while maintaining language model performance.

Abstract

To mitigate societal biases implicitly encoded in recent successful pretrained language models, a diverse array of approaches have been proposed to encourage model fairness, focusing on prompting, data augmentation, regularized fine-tuning, and more. Despite the development, it is nontrivial to reach a principled understanding of fairness and an effective algorithm that can consistently debias language models. In this work, by rigorous evaluations of Neural Collapse -- a learning phenomenon happen in last-layer representations and classifiers in deep networks -- on fairness-related words, we find that debiased language models exhibit collapsed alignment between token representations and word embeddings. More importantly, this observation inspires us to design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods, while still…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. Analyzing the debiased language models from the perspective of neural collapse is novel. This offers valuable insights into the structure of the model's embedding space across different debiasing approaches. 2. The proposed method is simple but effective and can be easily integrated with other approaches. It's interesting to see that by only adding a NC-based regularization term in the loss function, all the compared debiasing methods can be enhanced. 3. The experiments are comprehensive and

Weaknesses

1. **Clarify of Notations**: Some notations are not clearly defined, which could impact readability. - The class embedding variances mentioned on line 161 and line 240 could benefit from a more explicit definition, similar to how it is presented in Eq. 3 of the Linguistic Collapse[1] paper. - In Eq. 1, to align with the token representation defined at line 137, the notation should ideally be ${h}(E(x_{1:t}))$ rather than ${h}(x_{1:t})$ for clarity and consistency. 2. **Evaluation Metri

Reviewer 02Rating 8Confidence 3

Strengths

### Strengths - The paper studies a missing link between neural collapse and fairness and proposes a bias mitigation strategy based on neural collapse objective that is agnostic to pre-training or fine-tuning methods. This also avoids manual data balancing or filtering. - Both intrinsic and extrinsic fairness metrics see improvements with this approach.

Weaknesses

### Weakness - Table 9 does not have the best results bold which makes it reading a little harder. - While the paper is well-written, simple to follow and proposes a simple technique which works, all the experiments conducted are on BERT based MLM based models. I understand this is intentional to ensure fair comparison with prior work but this also limits understanding the degradation on non MLM tasks. For eg. in Table 9, the results reported are on the GLUE benchmark which are all multiple ch

Reviewer 03Rating 5Confidence 4

Strengths

The paper is logically structured and easy to follow, with a clear progression from problem statement to method proposal and experimental validation.

Weaknesses

1. In Section 4, the details of the datasets and metrics can be more concisely presented, with some details moved to the appendix. Conversely, the discussion of results is somewhat inadequate. It would be beneficial to highlight the advantages of the proposed method in comparison to the baselines. 2. Additional analysis should be included in section 4.2. From the results alone, it is observed that many outcomes with the addition of (U)NC3 have significantly decreased. 3. It is unclear whether th

Code & Models

Repositories

Xujxyang/Fairness-NC-main
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI