Neural-based Cross-modal Search and Retrieval of Artwork

Yan Gong; Georgina Cosma; Axel Finke

arXiv:2307.14244·cs.MM·July 27, 2023

Neural-based Cross-modal Search and Retrieval of Artwork

Yan Gong, Georgina Cosma, Axel Finke

PDF

Open Access

TL;DR

This paper presents BoonArt, a deep learning-based cross-modal search engine for artwork that significantly improves retrieval accuracy by leveraging visual-semantic embedding networks on the ArtUK dataset.

Contribution

It introduces BoonArt, the first VSE-based system tailored for artwork datasets, enabling effective image-text and text-image retrieval.

Findings

01

Achieved 97% Recall@10 for image-to-text retrieval

02

Achieved 97.4% Recall@10 for text-to-image retrieval

03

Outperforms traditional search engines on ArtUK dataset

Abstract

Creating an intelligent search and retrieval system for artwork images, particularly paintings, is crucial for documenting cultural heritage, fostering wider public engagement, and advancing artistic analysis and interpretation. Visual-Semantic Embedding (VSE) networks are deep learning models used for information retrieval, which learn joint representations of textual and visual data, enabling 1) cross-modal search and retrieval tasks, such as image-to-text and text-to-image retrieval; and 2) relation-focused retrieval to capture entity relationships and provide more contextually relevant search results. Although VSE networks have played a significant role in cross-modal information retrieval, their application to painting datasets, such as ArtUK, remains unexplored. This paper introduces BoonArt, a VSE-based cross-modal search engine that allows users to search for images using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection