DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality   Attention

Shunsuke Kitada; Yuki Iwazaki; Riku Togashi; Hitoshi Iyatomi

arXiv:2209.03126·cs.MM·November 23, 2022

DM$^2$S$^2$: Deep Multi-Modal Sequence Sets with Hierarchical Modality Attention

Shunsuke Kitada, Yuki Iwazaki, Riku Togashi, Hitoshi Iyatomi

PDF

Open Access

TL;DR

This paper introduces DM$^2$S$^2$, a novel deep learning framework that models multimodal data as sequence sets with hierarchical attention, improving interpretability and performance over traditional mid-fusion methods.

Contribution

The paper proposes a set-aware multimodal learning approach with hierarchical attention mechanisms, addressing issues of high dimensionality and missing modalities in mid-fusion models.

Findings

01

Performance comparable or superior to previous models.

02

Visualization of attention weights offers interpretability.

03

Effective handling of multiple modalities with set-based approach.

Abstract

There is increasing interest in the use of multimodal data in various web applications, such as digital advertising and e-commerce. Typical methods for extracting important information from multimodal data rely on a mid-fusion architecture that combines the feature representations from multiple encoders. However, as the number of modalities increases, several potential problems with the mid-fusion model structure arise, such as an increase in the dimensionality of the concatenated multimodal features and missing modalities. To address these problems, we propose a new concept that considers multimodal inputs as a set of sequences, namely, deep multimodal sequence sets (DM $^{2}$ S $^{2}$ ). Our set-aware concept consists of three components that capture the relationships among multiple modalities: (a) a BERT-based encoder to handle the inter- and intra-order of elements in the sequences, (b)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Text and Document Classification Technologies · Multimodal Machine Learning Applications