Affective Feedback Synthesis Towards Multimodal Text and Image Data
Puneet Kumar, Gaurav Bhat, Omkar Ingle, Daksh Goyal and, Balasubramanian Raman

TL;DR
This paper introduces a new task of affective feedback synthesis for multimodal data, proposing a system trained on a large dataset of images, texts, and human comments, to generate human-like feedback.
Contribution
It presents a novel multimodal feedback synthesis system and constructs a large-scale dataset for training and evaluation.
Findings
Generated feedbacks are semantically similar to ground-truth comments.
The system produces relevant and human-like responses.
Quantitative and qualitative evaluations show improved performance.
Abstract
In this paper, we have defined a novel task of affective feedback synthesis that deals with generating feedback for input text & corresponding image in a similar way as humans respond towards the multimodal data. A feedback synthesis system has been proposed and trained using ground-truth human comments along with image-text input. We have also constructed a large-scale dataset consisting of image, text, Twitter user comments, and the number of likes for the comments by crawling the news articles through Twitter feeds. The proposed system extracts textual features using a transformer-based textual encoder while the visual features have been extracted using a Faster region-based convolutional neural networks model. The textual and visual features have been concatenated to construct the multimodal features using which the decoder synthesizes the feedback. We have compared the results of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
