Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey; Emily Sheng; Su Lin Blodgett; Alexandra Chouldechova; Jean Garcia-Gathright; Alexandra Olteanu; Hanna Wallach

arXiv:2506.04482·cs.CY·June 6, 2025

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems

Emma Harvey, Emily Sheng, Su Lin Blodgett, Alexandra Chouldechova, Jean Garcia-Gathright, Alexandra Olteanu, Hanna Wallach

PDF

Open Access 1 Video

TL;DR

This paper investigates the gap between available instruments for measuring harms caused by LLMs and the actual needs of practitioners, highlighting practical barriers and proposing solutions based on measurement theory.

Contribution

It identifies key misalignments and barriers in using existing instruments for measuring representational harms in LLMs and offers recommendations to improve their practical utility.

Findings

01

Practitioners often cannot use existing instruments effectively.

02

Misalignment between instruments and practitioner needs is common.

03

Practical and institutional barriers hinder instrument adoption.

Abstract

The NLP research community has made publicly available numerous instruments for measuring representational harms caused by large language model (LLM)-based systems. These instruments have taken the form of datasets, metrics, tools, and more. In this paper, we examine the extent to which such instruments meet the needs of practitioners tasked with evaluating LLM-based systems. Via semi-structured interviews with 12 such practitioners, we find that practitioners are often unable to use publicly available instruments for measuring representational harms. We identify two types of challenges. In some cases, instruments are not useful because they do not meaningfully measure what practitioners seek to measure or are otherwise misaligned with practitioner needs. In other cases, instruments - even useful instruments - are not used by practitioners due to practical and institutional barriers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding and Meeting Practitioner Needs When Measuring Representational Harms Caused by LLM-Based Systems· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Natural Language Processing Techniques