Retrieval-Augmented Generation for Natural Language Art Provenance Searches in the Getty Provenance Index
Mathew Henrickson

TL;DR
This paper introduces a Retrieval-Augmented Generation framework that enhances natural language and multilingual searches in art provenance archives, improving accessibility and efficiency for researchers using the Getty Provenance Index.
Contribution
The paper presents a novel RAG-based method for semantic retrieval and summarization in art provenance research, addressing limitations of metadata-dependent search portals.
Findings
Effective retrieval and summarization of auction records
Scalable solution for navigating art market archives
Improves exploratory search capabilities
Abstract
This research presents a Retrieval-Augmented Generation (RAG) framework for art provenance studies, focusing on the Getty Provenance Index. Provenance research establishes the ownership history of artworks, which is essential for verifying authenticity, supporting restitution and legal claims, and understanding the cultural and historical context of art objects. The process is complicated by fragmented, multilingual archival data that hinders efficient retrieval. Current search portals require precise metadata, limiting exploratory searches. Our method enables natural-language and multilingual searches through semantic retrieval and contextual summarization, reducing dependence on metadata structures. We assess RAG's capability to retrieve and summarize auction records using a 10,000-record sample from the Getty Provenance Index - German Sales. The results show this approach provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
