BiAxisAudit: A Novel Framework to Evaluate LLM Bias Across Prompt Sensitivity and Response-Layer Divergence
Jialing Gan, Junhao Dong, Songze Li

TL;DR
This paper introduces BiAxisAudit, a comprehensive framework for evaluating large language model bias across different prompt formats and response layers, revealing hidden biases and inconsistencies.
Contribution
It proposes a novel bias assessment protocol that measures bias reliability on orthogonal axes, accounting for prompt sensitivity and internal response divergence.
Findings
Bias varies significantly with prompt format and response layer.
A large portion of bias signals are confined to specific coding layers.
Prompt configurations can reduce bias or merely redistribute it across response components.
Abstract
Bias audits of large language models now operate within governance frameworks such as the EU AI Act, making benchmark reliability a security concern in its own right. Many current benchmarks, however, collapse bias into a single scalar from one prompt format and one surface label. This design misses two failure modes that can be exploited without changing model weights. Across prompts, meaning-preserving format changes shift bias endorsement by more than on a fixed statement pool. Within a response, the discrete Selection and free-text Elaboration can take opposing stances, so an apparently clean aggregate may hide substantial internal inconsistency (a ``cancellation trap''). Selection-only and elaboration-only rankings are therefore nearly uncorrelated across eight LLMs (Spearman , ): LLaMA3-70B ranks in the middle under selection-only scoring but highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
