Cross-view Action Recognition Understanding From Exocentric to   Egocentric Perspective

Thanh-Dat Truong; Khoa Luu

arXiv:2305.15699·cs.CV·August 27, 2024·1 cites

Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective

Thanh-Dat Truong, Khoa Luu

PDF

Open Access

TL;DR

This paper introduces a novel cross-view learning method for action recognition that transfers knowledge from large-scale exocentric videos to egocentric videos using geometric constraints and a new self-attention loss, achieving state-of-the-art results.

Contribution

It proposes a geometric-based constraint integrated into Transformer self-attention and a cross-view self-attention loss for effective knowledge transfer across views.

Findings

01

Achieves state-of-the-art performance on Charades-Ego, EPIC-Kitchens-55, and EPIC-Kitchens-100.

02

Demonstrates the effectiveness of geometric constraints in cross-view attention.

03

Shows improved transfer learning from exocentric to egocentric videos.

Abstract

Understanding action recognition in egocentric videos has emerged as a vital research topic with numerous practical applications. With the limitation in the scale of egocentric data collection, learning robust deep learning-based action recognition models remains difficult. Transferring knowledge learned from the large-scale exocentric data to the egocentric data is challenging due to the difference in videos across views. Our work introduces a novel cross-view learning approach to action recognition (CVAR) that effectively transfers knowledge from the exocentric to the selfish view. First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer based on analyzing the camera positions between two views. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Medical Imaging and Analysis · Stroke Rehabilitation and Recovery

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Dense Connections