Bias Ahead: Sensitive Prompts as Early Warnings for Fairness in Large Language Models

Gianmario Voria; Martina De Lucia; Alessandra Raia; Andrea De Lucia; Gemma Catolino; Fabio Palomba

arXiv:2604.05575·cs.SE·April 8, 2026

Bias Ahead: Sensitive Prompts as Early Warnings for Fairness in Large Language Models

Gianmario Voria, Martina De Lucia, Alessandra Raia, Andrea De Lucia, Gemma Catolino, Fabio Palomba

PDF

TL;DR

This paper introduces sensitive prompts as a new way to evaluate fairness in large language models, highlighting their potential to serve as early warnings for bias and ethical issues before deployment.

Contribution

It presents the SensY dataset of sensitive prompts, analyzes LLM responses to these prompts, and develops a classifier to predict prompt sensitivity for fairness assessment.

Findings

01

Models often give factually correct answers but overlook ethical implications.

02

The automated classifier effectively predicts prompt sensitivity.

03

Prompt sensitivity can serve as an early-warning tool for fairness risks.

Abstract

Large Language Models (LLMs) are being increasingly integrated into software systems, offering powerful capabilities but also raising concerns about fairness. Existing fairness benchmarks, however, focus on stereotype-specific associations, which limit their ability to anticipate risks in diverse, real-world contexts. In this paper, we propose sensitive prompts as a new abstraction for fairness evaluation: inputs that are not inherently biased but are more likely to elicit biased or inadequate responses due to the sensitivity of their content. We construct and release SensY, a dataset of 12,801 prompts, categorized as sensitive and non-sensitive, spanning seven thematic domains, combining synthetic generation and real user inputs. Using this dataset, we query three open-source LLMs and manually analyze 4,500 responses to evaluate their adequacy in answering sensitive prompts. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.