Team PyKale (xy9) Submission to the EPIC-Kitchens 2021 Unsupervised Domain Adaptation Challenge for Action Recognition
Xianyuan Liu, Raivo Koot, Shuo Zhou, Tao Lei, Haiping Lu

TL;DR
This paper presents a multi-modal transformer-based approach with adversarial domain adaptation for unsupervised action recognition in challenging EPIC-Kitchens videos, achieving competitive results.
Contribution
It introduces a novel multi-modal transformer framework with temporal attention and adversarial adaptation for unsupervised domain adaptation in complex video datasets.
Findings
Achieved top-5 accuracy in all tasks
Outperformed baseline methods on verb and noun classification
Secured 5th place in the challenge
Abstract
This report describes the technical details of our submission to the EPIC-Kitchens 2021 Unsupervised Domain Adaptation Challenge for Action Recognition. The EPIC-Kitchens dataset is more difficult than other video domain adaptation datasets due to multi-tasks with more modalities. Firstly, to participate in the challenge, we employ a transformer to capture the spatial information from each modality. Secondly, we employ a temporal attention module to model temporal-wise inter-dependency. Thirdly, we employ the adversarial domain adaptation network to learn the general features between labeled source and unlabeled target domain. Finally, we incorporate multiple modalities to improve the performance by a three-stream network with late fusion. Our network achieves the comparable performance with the state-of-the-art baseline TN and outperforms the baseline on top-1 accuracy for verb…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications
