PosMLP-Video: Spatial and Temporal Relative Position Encoding for   Efficient Video Recognition

Yanbin Hao; Diansong Zhou; Zhicai Wang; Chong-Wah Ngo; Meng Wang

arXiv:2407.02934·cs.CV·July 4, 2024

PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition

Yanbin Hao, Diansong Zhou, Zhicai Wang, Chong-Wah Ngo, Meng Wang

PDF

1 Repo

TL;DR

PosMLP-Video introduces an efficient MLP-based backbone for video recognition that uses relative positional encoding to model spatio-temporal relations, achieving competitive accuracy with fewer parameters and FLOPs.

Contribution

The paper proposes PosMLP-Video, a lightweight MLP-like model with novel relative positional encoding and gating units for efficient spatio-temporal video modeling.

Findings

01

Achieves 59.0% top-1 accuracy on Something-Something V1

02

Achieves 70.3% top-1 accuracy on Something-Something V2

03

Achieves 82.1% top-1 accuracy on Kinetics-400

Abstract

In recent years, vision Transformers and MLPs have demonstrated remarkable performance in image understanding tasks. However, their inherently dense computational operators, such as self-attention and token-mixing layers, pose significant challenges when applied to spatio-temporal video data. To address this gap, we propose PosMLP-Video, a lightweight yet powerful MLP-like backbone for video recognition. Instead of dense operators, we use efficient relative positional encoding (RPE) to build pairwise token relations, leveraging small-sized parameterized relative position biases to obtain each relation score. Specifically, to enable spatio-temporal modeling, we extend the image PosMLP's positional gating unit to temporal, spatial, and spatio-temporal variants, namely PoTGU, PoSGU, and PoSTGU, respectively. These gating units can be feasibly combined into three types of spatio-temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhouds1918/posmlp_video
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.