Responsible Intelligence in Practice: A Fairness Audit of Open Large Language Models for Library Reference Services
Haining Wang, Jason Clark, Angelica Pe\~na

TL;DR
This paper evaluates the fairness of open large language models used in library reference services, analyzing potential biases to ensure equitable support for diverse patrons.
Contribution
It introduces a systematic evaluation method combining diagnostic and linguistic analysis to assess bias in open LLMs for library use.
Findings
No significant racial/ethnic bias found in models
Minor sex-linked bias observed in one model
Highlights need for ongoing bias monitoring
Abstract
As libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLibrary Science and Administration · Library Science and Information Systems · Text Readability and Simplification
