AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Rameswar Panda; Chun-Fu Chen; Quanfu Fan; Ximeng Sun; Kate Saenko,; Aude Oliva; Rogerio Feris

arXiv:2105.05165·cs.CV·May 13, 2021

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko,, Aude Oliva, Rogerio Feris

PDF

Open Access 1 Repo

TL;DR

AdaMML introduces an adaptive framework for multi-modal video recognition that dynamically selects the most relevant modalities per segment, significantly reducing computation while improving accuracy.

Contribution

It proposes a novel adaptive multi-modal learning framework with a policy network that dynamically chooses modalities, enhancing efficiency and performance in video recognition.

Findings

01

Achieves 35%-55% reduction in computation compared to baseline.

02

Consistently outperforms state-of-the-art methods in accuracy.

03

Demonstrates effectiveness across four diverse datasets.

Abstract

Multi-modal learning, which focuses on utilizing various modalities to improve the performance of a model, is widely used in video recognition. While traditional multi-modal learning offers excellent recognition results, its computational expense limits its impact for many real-world applications. In this paper, we propose an adaptive multi-modal learning framework, called AdaMML, that selects on-the-fly the optimal modalities for each segment conditioned on the input for efficient video recognition. Specifically, given a video segment, a multi-modal policy network is used to decide what modalities should be used for processing by the recognition model, with the goal of improving both accuracy and efficiency. We efficiently train the policy network jointly with the recognition model using standard back-propagation. Extensive experiments on four challenging diverse datasets demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/AdaMML
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Cancer-related molecular mechanisms research