A Multi-Modal Approach to Infer Image Affect

Ashok Sundaresan; Sugumar Murugesan; Sean Davis; Karthik Kappaganthu,; ZhongYi Jin; Divya Jain; Anurag Maunder

arXiv:1803.05070·cs.CV·March 15, 2018

A Multi-Modal Approach to Infer Image Affect

Ashok Sundaresan, Sugumar Murugesan, Sean Davis, Karthik Kappaganthu,, ZhongYi Jin, Divya Jain, Anurag Maunder

PDF

Open Access

TL;DR

This paper introduces a novel multi-modal deep learning approach that combines facial, scene, pose, text, and CNN features to improve image affect inference, marking the first use of all modalities with deep neural networks.

Contribution

It presents a comprehensive multi-modal framework utilizing deep neural networks for all modalities, advancing the state-of-the-art in image affect analysis.

Findings

01

Improved accuracy over baseline models

02

All modalities effectively contribute to affect inference

03

Insights into modality importance and integration

Abstract

The group affect or emotion in an image of people can be inferred by extracting features about both the people in the picture and the overall makeup of the scene. The state-of-the-art on this problem investigates a combination of facial features, scene extraction and even audio tonality. This paper combines three additional modalities, namely, human pose, text-based tagging and CNN extracted features / predictions. To the best of our knowledge, this is the first time all of the modalities were extracted using deep neural networks. We evaluate the performance of our approach against baselines and identify insights throughout this paper.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis