TL;DR
This paper introduces a Transformer-based joint-encoding approach for simultaneous emotion recognition and sentiment analysis in multimodal language, utilizing co-attention and glimpse layers to effectively encode multiple modalities.
Contribution
It proposes a novel Transformer architecture with modular co-attention and glimpse layers for joint multimodal emotion and sentiment analysis, advancing the state-of-the-art in this area.
Findings
Submitted to ACL20 Grand-Challenge on Multimodal Language
Open-source code available for replication
Evaluated on CMU-MOSEI dataset
Abstract
Understanding expressed sentiment and emotions are two crucial factors in human multimodal language. This paper describes a Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis. In addition to use the Transformer architecture, our approach relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities. The proposed solution has also been submitted to the ACL20: Second Grand-Challenge on Multimodal Language to be evaluated on the CMU-MOSEI dataset. The code to replicate the presented experiments is open-source: https://github.com/jbdel/MOSEI_UMONS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding
