Soft-prompt Tuning for Large Language Models to Evaluate Bias
Jacob-Junqi Tian, David Emerson, Sevil Zanjani Miyandoab, Deval, Pandya, Laleh Seyyed-Kalantari, Faiza Khan Khattak

TL;DR
This paper investigates using soft-prompt tuning on large language models to evaluate and identify biases in sentiment classification tasks, aiming to reduce human bias in prompt design.
Contribution
It introduces a bias evaluation method using soft-prompts that avoids human bias injection and provides insights into model biases across sensitive attributes.
Findings
Identified bias patterns in LLMs for different sensitive attributes
Demonstrated effectiveness of soft-prompts in bias evaluation
Open-sourced the bias evaluation pipeline
Abstract
Prompting large language models has gained immense popularity in recent years due to the advantage of producing good results even without the need for labelled data. However, this requires prompt tuning to get optimal prompts that lead to better model performances. In this paper, we explore the use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models (LLMs) such as Open Pre-trained Transformers (OPT) and Galactica language model. Since these models are trained on real-world data that could be prone to bias toward certain groups of populations, it is important to identify these underlying issues. Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection that can be caused by manually designed prompts. We check the model biases on different sensitive attributes using the group fairness (bias) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
MethodsGalactica
