Understanding of Emotion Perception from Art
Digbalay Bose, Krishna Somandepalli, Souvik Kundu, Rimita Lahiri,, Jonathan Gratch, Shrikanth Narayanan

TL;DR
This paper explores how multimodal transformer models can better understand emotions evoked by art through images and text, showing that single-stream models outperform others especially for extreme emotions.
Contribution
It demonstrates that single-stream multimodal transformers like MMBT outperform dual-stream models in emotion classification from art, advancing multimodal emotion understanding.
Findings
Single-stream models outperform dual-stream models.
MMBT and VisualBERT perform better than image-only models.
Improved detection of extreme positive and negative emotions.
Abstract
Computational modeling of the emotions evoked by art in humans is a challenging problem because of the subjective and nuanced nature of art and affective signals. In this paper, we consider the above-mentioned problem of understanding emotions evoked in viewers by artwork using both text and visual modalities. Specifically, we analyze images and the accompanying text captions from the viewers expressing emotions as a multimodal classification task. Our results show that single-stream multimodal transformer-based models like MMBT and VisualBERT perform better compared to both image-only models and dual-stream multimodal models having separate pathways for text and image modalities. We also observe improvements in performance for extreme positive and negative emotion classes, when a single-stream model like MMBT is compared with a text-only transformer model like BERT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Aesthetic Perception and Analysis · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · VisualBERT · WordPiece · Adam · Dense Connections · Softmax · Dropout · Layer Normalization
