Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022
Maria Escobar, Laura Daza, Cristina Gonz\'alez, Jordi Pont-Tuset,, Pablo Arbel\'aez

TL;DR
This paper applies Video Swin Transformers to egocentric video understanding tasks, specifically temporal localization and object state change classification, achieving competitive results on the Ego4D Challenges 2022.
Contribution
It introduces the use of Video Swin Transformers for egocentric video tasks, demonstrating their effectiveness in challenging real-world scenarios.
Findings
Achieved competitive performance on temporal localization.
Successfully applied to object state change classification.
Validated effectiveness of Video Swin Transformers in egocentric videos.
Abstract
We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Advanced Vision and Imaging
MethodsAttention Is All You Need · Linear Layer · Stochastic Depth · Position-Wise Feed-Forward Layer · Softmax · Byte Pair Encoding · Adam · Label Smoothing · Dense Connections · Dropout
