Video Swin Transformers for Egocentric Video Understanding @ Ego4D   Challenges 2022

Maria Escobar; Laura Daza; Cristina Gonz\'alez; Jordi Pont-Tuset,; Pablo Arbel\'aez

arXiv:2207.11329·cs.CV·July 26, 2022·1 cites

Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022

Maria Escobar, Laura Daza, Cristina Gonz\'alez, Jordi Pont-Tuset,, Pablo Arbel\'aez

PDF

Open Access

TL;DR

This paper applies Video Swin Transformers to egocentric video understanding tasks, specifically temporal localization and object state change classification, achieving competitive results on the Ego4D Challenges 2022.

Contribution

It introduces the use of Video Swin Transformers for egocentric video tasks, demonstrating their effectiveness in challenging real-world scenarios.

Findings

01

Achieved competitive performance on temporal localization.

02

Successfully applied to object state change classification.

03

Validated effectiveness of Video Swin Transformers in egocentric videos.

Abstract

We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Advanced Vision and Imaging

MethodsAttention Is All You Need · Linear Layer · Stochastic Depth · Position-Wise Feed-Forward Layer · Softmax · Byte Pair Encoding · Adam · Label Smoothing · Dense Connections · Dropout