Bias Ahead: Sensitive Prompts as Early Warnings for Fairness in Large Language Models
Gianmario Voria, Martina De Lucia, Alessandra Raia, Andrea De Lucia, Gemma Catolino, Fabio Palomba

TL;DR
This paper introduces sensitive prompts as a new way to evaluate fairness in large language models, highlighting their potential to serve as early warnings for bias and ethical issues before deployment.
Contribution
It presents the SensY dataset of sensitive prompts, analyzes LLM responses to these prompts, and develops a classifier to predict prompt sensitivity for fairness assessment.
Findings
Models often give factually correct answers but overlook ethical implications.
The automated classifier effectively predicts prompt sensitivity.
Prompt sensitivity can serve as an early-warning tool for fairness risks.
Abstract
Large Language Models (LLMs) are being increasingly integrated into software systems, offering powerful capabilities but also raising concerns about fairness. Existing fairness benchmarks, however, focus on stereotype-specific associations, which limit their ability to anticipate risks in diverse, real-world contexts. In this paper, we propose sensitive prompts as a new abstraction for fairness evaluation: inputs that are not inherently biased but are more likely to elicit biased or inadequate responses due to the sensitivity of their content. We construct and release SensY, a dataset of 12,801 prompts, categorized as sensitive and non-sensitive, spanning seven thematic domains, combining synthetic generation and real user inputs. Using this dataset, we query three open-source LLMs and manually analyze 4,500 responses to evaluate their adequacy in answering sensitive prompts. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
