Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation
Fatima Kazi

TL;DR
This paper critically examines biases in large language models, evaluates their biases using specific benchmarks, and explores mitigation strategies like fine-tuning and data augmentation to improve fairness and reduce stereotypes.
Contribution
It provides a comprehensive bias analysis of LLMs using multiple benchmarks and introduces effective mitigation techniques such as fine-tuning and data augmentation.
Findings
Fine-tuned models show improved bias mitigation, especially in implicit biases.
Models struggle with gender biases but perform better on racial biases.
Bias mitigation techniques can enhance model fairness by up to 20%.
Abstract
Large Language models (LLMs), such as ChatGPT, have gained popularity in recent years with the advancement of Natural Language Processing (NLP), with use cases spanning many disciplines and daily lives as well. LLMs inherit explicit and implicit biases from the datasets they were trained on; these biases can include social, ethical, cultural, religious, and other prejudices and stereotypes. It is important to comprehensively examine such shortcomings by identifying the existence and extent of such biases, recognizing the origin, and attempting to mitigate such biased outputs to ensure fair outputs to reduce harmful stereotypes and misinformation. This study inspects and highlights the need to address biases in LLMs amid growing generative Artificial Intelligence (AI). We utilize bias-specific benchmarks such StereoSet and CrowSPairs to evaluate the existence of various biases in many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Ethics and Social Impacts of AI · Computational and Text Analysis Methods
