NVIDIA-UNIBZ Submission for EPIC-KITCHENS-100 Action Anticipation Challenge 2022
Tsung-Ming Tai, Oswald Lanz, Giuseppe Fiameni, Yi-Kwan Wong, Sze-Sen, Poon, Cheng-Kuang Lee, Ka-Chun Cheung, Simon See

TL;DR
This paper presents a model for action anticipation in kitchen videos using higher-order recurrent transformers and message-passing neural networks, achieving second place in a challenge with 19.61% top-5 recall.
Contribution
Introduces a novel recurrent-based architecture combining space-time transformers and message-passing networks for short-term action anticipation.
Findings
Achieved 19.61% top-5 recall on EPIC-Kitchen-100 test set.
Model ranked second on the public leaderboard.
Demonstrated effectiveness of combining multiple models with a new training pipeline.
Abstract
In this report, we describe the technical details of our submission for the EPIC-Kitchen-100 action anticipation challenge. Our modelings, the higher-order recurrent space-time transformer and the message-passing neural network with edge learning, are both recurrent-based architectures which observe only 2.5 seconds inference context to form the action anticipation prediction. By averaging the prediction scores from a set of models compiled with our proposed training pipeline, we achieved strong performance on the test set, which is 19.61% overall mean top-5 recall, recorded as second place on the public leaderboard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems
MethodsTest
