Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework

Alessandra Parziale; Gianmario Voria; Valeria Pontillo; Gemma Catolino; Andrea De Lucia; Fabio Palomba

arXiv:2512.16816·cs.SE·December 19, 2025

Toward Systematic Counterfactual Fairness Evaluation of Large Language Models: The CAFFE Framework

Alessandra Parziale, Gianmario Voria, Valeria Pontillo, Gemma Catolino, Andrea De Lucia, Fabio Palomba

PDF

Open Access

TL;DR

The paper introduces CAFFE, a structured framework for systematically evaluating counterfactual fairness in large language models, enhancing bias detection beyond existing metamorphic testing methods.

Contribution

It proposes a formal, intent-aware testing framework with automated test data generation and semantic response evaluation for fairness assessment of LLMs.

Findings

01

CAFFE covers broader bias scenarios than previous methods.

02

It reliably detects unfair behaviors in various LLM architectures.

03

The framework improves fairness evaluation consistency.

Abstract

Nowadays, Large Language Models (LLMs) are foundational components of modern software systems. As their influence grows, concerns about fairness have become increasingly pressing. Prior work has proposed metamorphic testing to detect fairness issues, applying input transformations to uncover inconsistencies in model behavior. This paper introduces an alternative perspective for testing counterfactual fairness in LLMs, proposing a structured and intent-aware framework coined CAFFE (Counterfactual Assessment Framework for Fairness Evaluation). Inspired by traditional non-functional testing, CAFFE (1) formalizes LLM-Fairness test cases through explicitly defined components, including prompt intent, conversational context, input variants, expected fairness thresholds, and test environment configuration, (2) assists testers by automatically generating targeted test data, and (3) evaluates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)