Meta-Fair: AI-Assisted Fairness Testing of Large Language Models
Miguel Romero-Arjona, Jos\'e A. Parejo, Juan C. Alonso, Ana B. S\'anchez, Aitor Arrieta, Sergio Segura

TL;DR
Meta-Fair introduces an automated, LLM-driven approach for fairness testing in large language models, utilizing metamorphic testing and leveraging LLM capabilities for diverse input generation and output evaluation, significantly reducing manual effort.
Contribution
The paper presents Meta-Fair, a novel automated framework that employs metamorphic testing and LLMs for bias detection, advancing fairness assessment methods for large language models.
Findings
Meta-Fair achieves 92% average precision in bias detection.
Revealed biased behavior in 29% of test executions.
LLMs serve as reliable evaluators with F1-scores up to 0.79.
Abstract
Fairness--the absence of unjustified bias--is a core principle in the development of Artificial Intelligence (AI) systems, yet it remains difficult to assess and enforce. Current approaches to fairness testing in large language models (LLMs) often rely on manual evaluation, fixed templates, deterministic heuristics, and curated datasets, making them resource-intensive and difficult to scale. This work aims to lay the groundwork for a novel, automated method for testing fairness in LLMs, reducing the dependence on domain-specific resources and broadening the applicability of current approaches. Our approach, Meta-Fair, is based on two key ideas. First, we adopt metamorphic testing to uncover bias by examining how model outputs vary in response to controlled modifications of input prompts, defined by metamorphic relations (MRs). Second, we propose exploiting the potential of LLMs for both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
