Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework
Alessandra Parziale, Gianmario Voria, Valeria Pontillo, Gemma Catolino, Andrea De Lucia, Fabio Palomba

TL;DR
The paper introduces CAFFE, a structured framework for systematically evaluating counterfactual fairness in large language models, enhancing bias detection beyond existing metamorphic testing methods.
Contribution
It proposes a formal, intent-aware testing framework with automated test data generation and semantic response evaluation for fairness assessment of LLMs.
Findings
CAFFE covers broader bias scenarios than previous methods.
It reliably detects unfair behaviors in various LLM architectures.
The framework improves fairness evaluation consistency.
Abstract
Nowadays, Large Language Models (LLMs) are foundational components of modern software systems. As their influence grows, concerns about fairness have become increasingly pressing. Prior work has proposed metamorphic testing to detect fairness issues, applying input transformations to uncover inconsistencies in model behavior. This paper introduces an alternative perspective for testing counterfactual fairness in LLMs, proposing a structured and intent-aware framework coined CAFFE (Counterfactual Assessment Framework for Fairness Evaluation). Inspired by traditional non-functional testing, CAFFE (1) formalizes LLM-Fairness test cases through explicitly defined components, including prompt intent, conversational context, input variants, expected fairness thresholds, and test environment configuration, (2) assists testers by automatically generating targeted test data, and (3) evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
