BiasLab: A Multilingual, Dual-Framing Framework for Robust Measurement of Output-Level Bias in Large Language Models

William Guey; Wei Zhang; Pei-Luen Patrick Rau; Pierrick Bougault; Vitor D. de Moura; Bertan Ucar; and Jose O. Gomes

arXiv:2601.06861·cs.CL·January 13, 2026

BiasLab: A Multilingual, Dual-Framing Framework for Robust Measurement of Output-Level Bias in Large Language Models

William Guey, Wei Zhang, Pei-Luen Patrick Rau, Pierrick Bougault, Vitor D. de Moura, Bertan Ucar, and Jose O. Gomes

PDF

Open Access

TL;DR

BiasLab is an open-source framework that provides a standardized, multilingual, and robust method for measuring output-level bias in large language models through dual-framing and randomized evaluation techniques.

Contribution

It introduces a novel, model-agnostic evaluation framework that improves bias measurement reliability across languages and framing variations.

Findings

01

Effective in quantifying demographic, cultural, political, and geopolitical biases.

02

Supports reproducible, comparative bias analysis across models and languages.

03

Enables benchmarking of model robustness to framing and prompt variations.

Abstract

Large Language Models (LLMs) are increasingly deployed in high-stakes contexts where their outputs influence real-world decisions. However, evaluating bias in LLM outputs remains methodologically challenging due to sensitivity to prompt wording, limited multilingual coverage, and the lack of standardized metrics that enable reliable comparison across models. This paper introduces BiasLab, an open-source, model-agnostic evaluation framework for quantifying output-level (extrinsic) bias through a multilingual, robustness-oriented experimental design. BiasLab constructs mirrored probe pairs under a strict dual-framing scheme: an affirmative assertion favoring Target A and a reverse assertion obtained by deterministic target substitution favoring Target B, while preserving identical linguistic structure. To reduce dependence on prompt templates, BiasLab performs repeated evaluation under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education