DirecFormer: A Directed Attention in Transformer Approach to Robust   Action Recognition

Thanh-Dat Truong; Quoc-Huy Bui; Chi Nhan Duong; Han-Seok Seo; Son Lam; Phung; Xin Li; Khoa Luu

arXiv:2203.10233·cs.CV·March 22, 2022·6 cites

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Thanh-Dat Truong, Quoc-Huy Bui, Chi Nhan Duong, Han-Seok Seo, Son Lam, Phung, Xin Li, Khoa Luu

PDF

Open Access 1 Repo

TL;DR

This paper introduces DirecFormer, a novel Transformer-based framework with directed attention for robust human action recognition, addressing temporal ordering and sequence dependency issues to improve accuracy and generalization.

Contribution

It presents a new directed attention mechanism and models conditional dependencies in action sequences, advancing Transformer-based action recognition methods.

Findings

01

Achieves state-of-the-art results on Jester, Kinetics-400, and Something-Something-V2 datasets.

02

Demonstrates robustness and improved temporal understanding over existing methods.

03

Introduces the concept of ordered temporal learning in action recognition.

Abstract

Human action recognition has recently become one of the popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results. However, these methods have suffered some fundamental limitations such as lack of robustness and generalization, e.g., how does the temporal ordering of video frames affect the recognition results? This work presents a novel end-to-end Transformer-based Directed Attention (DirecFormer) framework for robust action recognition. The method takes a simple but novel perspective of Transformer-based approach to understand the right order of sequence actions. Therefore, the contributions of this work are three-fold. Firstly, we introduce the problem of ordered temporal learning issues to the action recognition problem.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uark-cviu/direcformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Advanced Neural Network Applications