Collapsed Language Models Promote Fairness
Jingxuan Xu, Wuyang Chen, Linyi Li, Yao Zhao, Yunchao Wei

TL;DR
This paper investigates the phenomenon of Neural Collapse in language models and leverages it to develop a principled fine-tuning method that enhances fairness without sacrificing task performance.
Contribution
It introduces a novel understanding of fairness-related biases through Neural Collapse and proposes a new fine-tuning approach to improve fairness across various debiasing techniques.
Findings
Debiased models show collapsed alignment in last-layer representations.
The proposed method improves fairness across multiple debiasing approaches.
Fairness is enhanced while maintaining language model performance.
Abstract
To mitigate societal biases implicitly encoded in recent successful pretrained language models, a diverse array of approaches have been proposed to encourage model fairness, focusing on prompting, data augmentation, regularized fine-tuning, and more. Despite the development, it is nontrivial to reach a principled understanding of fairness and an effective algorithm that can consistently debias language models. In this work, by rigorous evaluations of Neural Collapse -- a learning phenomenon happen in last-layer representations and classifiers in deep networks -- on fairness-related words, we find that debiased language models exhibit collapsed alignment between token representations and word embeddings. More importantly, this observation inspires us to design a principled fine-tuning method that can effectively improve fairness in a wide range of debiasing methods, while still…
Peer Reviews
Decision·ICLR 2025 Poster
1. Analyzing the debiased language models from the perspective of neural collapse is novel. This offers valuable insights into the structure of the model's embedding space across different debiasing approaches. 2. The proposed method is simple but effective and can be easily integrated with other approaches. It's interesting to see that by only adding a NC-based regularization term in the loss function, all the compared debiasing methods can be enhanced. 3. The experiments are comprehensive and
1. **Clarify of Notations**: Some notations are not clearly defined, which could impact readability. - The class embedding variances mentioned on line 161 and line 240 could benefit from a more explicit definition, similar to how it is presented in Eq. 3 of the Linguistic Collapse[1] paper. - In Eq. 1, to align with the token representation defined at line 137, the notation should ideally be ${h}(E(x_{1:t}))$ rather than ${h}(x_{1:t})$ for clarity and consistency. 2. **Evaluation Metri
### Strengths - The paper studies a missing link between neural collapse and fairness and proposes a bias mitigation strategy based on neural collapse objective that is agnostic to pre-training or fine-tuning methods. This also avoids manual data balancing or filtering. - Both intrinsic and extrinsic fairness metrics see improvements with this approach.
### Weakness - Table 9 does not have the best results bold which makes it reading a little harder. - While the paper is well-written, simple to follow and proposes a simple technique which works, all the experiments conducted are on BERT based MLM based models. I understand this is intentional to ensure fair comparison with prior work but this also limits understanding the degradation on non MLM tasks. For eg. in Table 9, the results reported are on the GLUE benchmark which are all multiple ch
The paper is logically structured and easy to follow, with a clear progression from problem statement to method proposal and experimental validation.
1. In Section 4, the details of the datasets and metrics can be more concisely presented, with some details moved to the appendix. Conversely, the discussion of results is somewhat inadequate. It would be beneficial to highlight the advantages of the proposed method in comparison to the baselines. 2. Additional analysis should be included in section 4.2. From the results alone, it is observed that many outcomes with the addition of (U)NC3 have significantly decreased. 3. It is unclear whether th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
