Co$^2$PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt Tuning
Xiangjue Dong, Ziwei Zhu, Zhuoer Wang, Maria Teleki, James Caverlee

TL;DR
Co$^2$PT introduces a prompt tuning method that effectively reduces social biases in pre-trained language models by using counterfactual contrastive prompts, improving bias mitigation during downstream task adaptation.
Contribution
The paper presents Co$^2$PT, a novel prompt tuning approach that mitigates social biases in language models through counterfactual contrastive learning, adaptable to existing debiased models.
Findings
Effective bias mitigation demonstrated on three benchmarks.
Compatible with existing upstream debiased models.
Enhances bias reduction during downstream task tuning.
Abstract
Pre-trained Language Models are widely used in many important real-world applications. However, recent studies show that these models can encode social biases from large pre-training corpora and even amplify biases in downstream applications. To address this challenge, we propose CoPT, an efficient and effective debias-while-prompt tuning method for mitigating biases via counterfactual contrastive prompt tuning on downstream tasks. Our experiments conducted on three extrinsic bias benchmarks demonstrate the effectiveness of CoPT on bias mitigation during the prompt tuning process and its adaptability to existing upstream debiased language models. These findings indicate the strength of CoPT and provide promising avenues for further enhancement in bias mitigation on downstream tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
