ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

Line Abele; Gerrit Anders; Tolgahan Ayd{\i}n; J\"urgen Buder; Helen Fischer; Dominik Kimmel; Markus Huff

arXiv:2507.07551·cs.HC·July 11, 2025

ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing

Line Abele, Gerrit Anders, Tolgahan Ayd{\i}n, J\"urgen Buder, Helen Fischer, Dominik Kimmel, Markus Huff

PDF

Open Access

TL;DR

This study evaluates the effectiveness of a vision language model in generating photographic catalog descriptions, highlighting the importance of human review and trust for successful integration into archival workflows.

Contribution

It provides a human-centered evaluation of AI-generated catalog descriptions, emphasizing the need for human oversight and trust-building in specialized archival contexts.

Findings

01

AI descriptions were often indistinguishable from human ones

02

Expert trust in AI tools was limited due to concerns about preservation

03

Human review is essential to ensure accuracy and quality of AI-generated metadata

Abstract

The accelerating growth of photographic collections has outpaced manual cataloguing, motivating the use of vision language models (VLMs) to automate metadata generation. This study examines whether Al-generated catalogue descriptions can approximate human-written quality and how generative Al might integrate into cataloguing workflows in archival and museum collections. A VLM (InternVL2) generated catalogue descriptions for photographic prints on labelled cardboard mounts with archaeological content, evaluated by archive and archaeology experts and non-experts in a human-centered, experimental framework. Participants classified descriptions as AI-generated or expert-written, rated quality, and reported willingness to use and trust in AI tools. Classification performance was above chance level, with both groups underestimating their ability to detect Al-generated descriptions. OCR errors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Humanities and Scholarship · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques