Routing with Self-Attention for Multimodal Capsule Networks

Kevin Duarte; Brian Chen; Nina Shvetsova; Andrew Rouditchenko; Samuel; Thomas; Alexander Liu; David Harwath; James Glass; Hilde Kuehne; Mubarak Shah

arXiv:2112.00775·cs.CV·December 3, 2021·1 cites

Routing with Self-Attention for Multimodal Capsule Networks

Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel, Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

PDF

Open Access

TL;DR

This paper introduces a scalable multimodal capsule network with a novel self-attention routing mechanism, enabling effective learning from large-scale noisy video data and improving multimodal task performance.

Contribution

The paper proposes a new self-attention based routing method for capsule networks, allowing large-scale multimodal learning with improved efficiency and robustness.

Findings

01

Outperforms traditional routing techniques in multimodal tasks

02

Scales effectively to large video datasets

03

Achieves competitive results on multiple benchmarks

Abstract

The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as vision, text, and audio. One challenge in training such models is that they need to jointly learn semantic concepts and their relationships across different input representations. Capsule networks have been shown to perform well in context of capturing the relation between low-level input features and higher-level concepts. However, capsules have so far mainly been used only in small-scale fully supervised settings due to the resource demand of conventional routing algorithms. We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data. To adapt the capsules to large-scale input data, we propose a novel routing by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsCapsule Network