Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections

Orfeas Menis Mastromichalakis; Jason Liartis; Kristina Rose; Antoine Isaac; Giorgos Stamou

arXiv:2505.24538·cs.CL·June 2, 2025

Don't Erase, Inform! Detecting and Contextualizing Harmful Language in Cultural Heritage Collections

Orfeas Menis Mastromichalakis, Jason Liartis, Kristina Rose, Antoine Isaac, Giorgos Stamou

PDF

1 Repo

TL;DR

This paper presents an AI tool that detects offensive language in cultural heritage metadata, providing historical and contextual insights to promote informed curation without erasure.

Contribution

It introduces a multilingual, community-informed NLP system that contextualizes harmful terms in cultural heritage collections, aiding inclusive data curation.

Findings

01

Processed over 7.9 million records with the tool

02

Enabled contextual understanding of contentious terms

03

Supported integration with major cultural heritage platforms

Abstract

Cultural Heritage (CH) data hold invaluable knowledge, reflecting the history, traditions, and identities of societies, and shaping our understanding of the past and present. However, many CH collections contain outdated or offensive descriptions that reflect historical biases. CH Institutions (CHIs) face significant challenges in curating these data due to the vast scale and complexity of the task. To address this, we develop an AI-powered tool that detects offensive terms in CH metadata and provides contextual insights into their historical background and contemporary perception. We leverage a multilingual vocabulary co-created with marginalized communities, researchers, and CH professionals, along with traditional NLP techniques and Large Language Models (LLMs). Available as a standalone web app and integrated with major CH platforms, the tool has processed over 7.9 million records,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ails-lab/de-bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.