SFMViT: SlowFast Meet ViT in Chaotic World

Jiaying Lin; Jiajun Wen; Mengyuan Liu; Jinfu Liu; Baiqiao Yin; Yue Li

arXiv:2404.16609·cs.CV·August 14, 2024

SFMViT: SlowFast Meet ViT in Chaotic World

Jiaying Lin, Jiajun Wen, Mengyuan Liu, Jinfu Liu, Baiqiao Yin, Yue Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces SFMViT, a dual-stream spatiotemporal network combining ViT and SlowFast with an anchor pruning strategy, significantly improving action localization in chaotic videos.

Contribution

The paper presents a novel dual-stream architecture integrating ViT and SlowFast with an anchor pruning method for enhanced chaotic scene understanding.

Findings

01

Achieves 26.62% mAP on Chaotic World dataset

02

Outperforms existing models in chaotic scene action localization

03

Demonstrates effective global and spatiotemporal feature extraction

Abstract

The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an anchor pruning strategy. The backbone of our SFMViT is composed of ViT and SlowFast with prior knowledge of spatiotemporal action localization, which fully utilizes ViT's excellent global feature extraction capabilities and SlowFast's spatiotemporal sequence modeling capabilities. Secondly, we introduce the confidence maximum heap to prune the anchors detected in each frame of the picture to filter out the effective anchors. These designs enable our SFMViT to achieve a mAP of 26.62% in the Chaotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jfightyr/slowfast-meet-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Systems and Time Series Analysis

MethodsPruning