Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?
Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak

TL;DR
This paper evaluates instruction fine-tuned language models' ability to detect social biases through zero-shot prompting, highlighting the potential for bias identification and mitigation in large language models.
Contribution
It introduces a framework for assessing social bias detection in instruction fine-tuned models and demonstrates the effectiveness of Chain-of-Thought prompts in this task.
Findings
Alpaca 7B achieves 56.7% accuracy in bias identification.
Scaling model size and data diversity may improve bias detection performance.
This work is an initial step towards bias mitigation in language models.
Abstract
As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
