Automatic Modeling of Social Concepts Evoked by Art Images as Multimodal Frames
Delfina Sol Martinez Pandiani, Valentina Presutti

TL;DR
This paper introduces a multimodal framework for automatically modeling social concepts evoked by art images, integrating multisensory data to bridge the semantic gap in visual understanding.
Contribution
It translates cognitive theories into a software approach that represents social concepts as multimodal frames using a novel ontology and applies it to art collections.
Findings
Successful extraction of social concepts from art images using multimodal features
Development of a formal ontology for social concepts as multimodal frames
Empirical validation on Tate Gallery's art collection
Abstract
Social concepts referring to non-physical objects--such as revolution, violence, or friendship--are powerful tools to describe, index, and query the content of visual data, including ever-growing collections of art images from the Cultural Heritage (CH) field. While much progress has been made towards complete image understanding in computer vision, automatic detection of social concepts evoked by images is still a challenge. This is partly due to the well-known semantic gap problem, worsened for social concepts given their lack of unique physical features, and reliance on more unspecific features than concrete concepts. In this paper, we propose the translation of recent cognitive theories about social concept representation into a software approach to represent them as multimodal frames, by integrating multisensory data. Our method focuses on the extraction, analysis, and integration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
