Catalogue Grounded Multimodal Attribution for Museum Video under Resource and Regulatory Constraints
Minsak Nanang, Adrian Hilton, Armin Mustafa

TL;DR
This paper introduces a multimodal attribution framework for museum videos that automates metadata curation, improving discoverability while respecting resource and regulatory constraints.
Contribution
It presents a novel pipeline using a video language model for catalogue-grounded metadata generation in museum AV content, reducing manual effort.
Findings
Early deployment shows improved archive discoverability.
Framework respects data sovereignty and regulatory constraints.
Offers a transferable template for high-stakes domain applications.
Abstract
Audiovisual (AV) archives in museums and galleries are growing rapidly, but much of this material remains effectively locked away because it lacks consistent, searchable metadata. Existing method for archiving requires extensive manual effort. We address this by automating the most labour intensive part of the workflow: catalogue style metadata curation for in gallery video, grounded in an existing collection database. Concretely, we propose catalogue-grounded multimodal attribution for museum AV content using an open, locally deployable video language model. We design a multi pass pipeline that (i) summarises artworks in a video, (ii) generates catalogue style descriptions and genre labels, and (iii) attempts to attribute title and artist via conservative similarity matching to the structured catalogue. Early deployments on a painting catalogue suggest that this framework can improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
