MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu; Dongliang He; Tianwei Lin; Fu Li; Chuang Gan; Errui Ding

arXiv:2012.06977·cs.CV·January 6, 2021·6 cites

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding

PDF

Open Access 3 Repos 1 Video

TL;DR

MVFNet introduces a multi-view fusion approach using 2D CNNs to efficiently model video dynamics from multiple planes, achieving state-of-the-art results in action recognition benchmarks.

Contribution

The paper proposes a novel multi-view fusion module for 2D CNNs that captures video dynamics from multiple planes, enhancing efficiency and effectiveness in video recognition.

Findings

01

Achieves state-of-the-art performance on multiple benchmarks.

02

Maintains low complexity comparable to 2D CNNs.

03

Generalizes several existing video modeling methods.

Abstract

Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MVFNet: Multi-View Fusion Network for Efficient Video Recognition· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Anomaly Detection Techniques and Applications

MethodsConvolution