# EmbraceNet: A robust deep learning architecture for multimodal   classification

**Authors:** Jun-Ho Choi, Jong-Seok Lee

arXiv: 1904.09078 · 2019-04-22

## TL;DR

EmbraceNet is a new deep learning architecture designed for multimodal classification that effectively models cross-modal relationships and maintains high performance even when some data modalities are missing.

## Contribution

The paper introduces EmbraceNet, a novel multimodal fusion architecture that is compatible with various models and robust to partial data loss.

## Key findings

- Outperforms existing multimodal fusion methods when data parts are missing
- Demonstrates robustness across multiple datasets
- Maintains high classification accuracy despite incomplete data

## Abstract

Classification using multimodal data arises in many machine learning applications. It is crucial not only to model cross-modal relationship effectively but also to ensure robustness against loss of part of data or modalities. In this paper, we propose a novel deep learning-based multimodal fusion architecture for classification tasks, which guarantees compatibility with any kind of learning models, deals with cross-modal information carefully, and prevents performance degradation due to partial absence of data. We employ two datasets for multimodal classification tasks, build models based on our architecture and other state-of-the-art models, and analyze their performance on various situations. The results show that our architecture outperforms the other multimodal fusion architectures when some parts of data are not available.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09078/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09078/full.md

## References

53 references — full list in the complete paper: https://tomesphere.com/paper/1904.09078/full.md

---
Source: https://tomesphere.com/paper/1904.09078