Appearance-and-Relation Networks for Video Classification
Limin Wang, Wei Li, Wen Li, Luc Van Gool

TL;DR
This paper introduces ARTNet, a novel architecture for video classification that explicitly models appearance and temporal relations separately, leading to improved performance on standard benchmarks.
Contribution
The paper proposes SMART blocks that decouple spatial and temporal modeling, and demonstrates their effectiveness in an end-to-end video classification framework.
Findings
ARTNets outperform 3D convolution-based methods on Kinetics, UCF101, and HMDB51.
SMART blocks significantly improve spatiotemporal feature learning.
ARTNets achieve state-of-the-art results on multiple action recognition benchmarks.
Abstract
Spatiotemporal feature learning in videos is a fundamental problem in computer vision. This paper presents a new architecture, termed as Appearance-and-Relation Network (ARTNet), to learn video representation in an end-to-end manner. ARTNets are constructed by stacking multiple generic building blocks, called as SMART, whose goal is to simultaneously model appearance and relation from RGB input in a separate and explicit manner. Specifically, SMART blocks decouple the spatiotemporal learning module into an appearance branch for spatial modeling and a relation branch for temporal modeling. The appearance branch is implemented based on the linear combination of pixels or filter responses in each frame, while the relation branch is designed based on the multiplicative interactions between pixels or filter responses across multiple frames. We perform experiments on three action recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Face recognition and analysis
