# EmoBed: Strengthening Monomodal Emotion Recognition via Training with   Crossmodal Emotion Embeddings

**Authors:** Jing Han, Zixing Zhang, Zhao Ren, Bj\"orn Schuller

arXiv: 1907.10428 · 2019-07-25

## TL;DR

EmoBed is a novel crossmodal emotion embedding framework that leverages auxiliary modalities during training to enhance monomodal emotion recognition performance without requiring multiple modalities during inference.

## Contribution

The paper introduces EmoBed, a new framework that uses joint and crossmodal training to improve emotion recognition by exploiting semantic information from multiple modalities.

## Key findings

- Significantly outperforms baseline monomodal systems.
- Achieves competitive or superior results compared to recent systems.
- Effective on benchmark datasets RECOLA and OMG-Emotion.

## Abstract

Despite remarkable advances in emotion recognition, they are severely restrained from either the essentially limited property of the employed single modality, or the synchronous presence of all involved multiple modalities. Motivated by this, we propose a novel crossmodal emotion embedding framework called EmoBed, which aims to leverage the knowledge from other auxiliary modalities to improve the performance of an emotion recognition system at hand. The framework generally includes two main learning components, i. e., joint multimodal training and crossmodal training. Both of them tend to explore the underlying semantic emotion information but with a shared recognition network or with a shared emotion embedding space, respectively. In doing this, the enhanced system trained with this approach can efficiently make use of the complementary information from other modalities. Nevertheless, the presence of these auxiliary modalities is not demanded during inference. To empirically investigate the effectiveness and robustness of the proposed framework, we perform extensive experiments on the two benchmark databases RECOLA and OMG-Emotion for the tasks of dimensional emotion regression and categorical emotion classification, respectively. The obtained results show that the proposed framework significantly outperforms related baselines in monomodal inference, and are also competitive or superior to the recently reported systems, which emphasises the importance of the proposed crossmodal learning for emotion recognition.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10428/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10428/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/1907.10428/full.md

---
Source: https://tomesphere.com/paper/1907.10428