How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval
Noa Garcia, George Vogiatzis

TL;DR
This paper introduces SemArt, a multi-modal dataset and retrieval challenge for semantic art understanding, enabling retrieval of paintings based on textual descriptions and vice versa, with models showing promising results.
Contribution
The paper presents SemArt, a novel dataset and multi-modal retrieval task for semantic art understanding, along with models that encode visual and textual art representations into a shared semantic space.
Findings
Best model retrieves correct images within top 10 in 45.5% of cases
Models outperform baseline in multi-modal retrieval tasks
High correlation with human art understanding evaluation
Abstract
Automatic art analysis has been mostly focused on classifying artworks into different artistic styles. However, understanding an artistic representation involves more complex processes, such as identifying the elements in the scene or recognizing author influences. We present SemArt, a multi-modal dataset for semantic art understanding. SemArt is a collection of fine-art painting images in which each image is associated to a number of attributes and a textual artistic comment, such as those that appear in art catalogues or museum collections. To evaluate semantic art understanding, we envisage the Text2Art challenge, a multi-modal retrieval task where relevant paintings are retrieved according to an artistic text, and vice versa. We also propose several models for encoding visual and textual artistic representations into a common semantic space. Our best approach is able to find the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
