Routing with Self-Attention for Multimodal Capsule Networks
Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel, Thomas, Alexander Liu, David Harwath, James Glass, Hilde Kuehne, Mubarak Shah

TL;DR
This paper introduces a scalable multimodal capsule network with a novel self-attention routing mechanism, enabling effective learning from large-scale noisy video data and improving multimodal task performance.
Contribution
The paper proposes a new self-attention based routing method for capsule networks, allowing large-scale multimodal learning with improved efficiency and robustness.
Findings
Outperforms traditional routing techniques in multimodal tasks
Scales effectively to large video datasets
Achieves competitive results on multiple benchmarks
Abstract
The task of multimodal learning has seen a growing interest recently as it allows for training neural architectures based on different modalities such as vision, text, and audio. One challenge in training such models is that they need to jointly learn semantic concepts and their relationships across different input representations. Capsule networks have been shown to perform well in context of capturing the relation between low-level input features and higher-level concepts. However, capsules have so far mainly been used only in small-scale fully supervised settings due to the resource demand of conventional routing algorithms. We present a new multimodal capsule network that allows us to leverage the strength of capsules in the context of a multimodal learning framework on large amounts of video data. To adapt the capsules to large-scale input data, we propose a novel routing by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsCapsule Network
