ReLER@ZJU Submission to the Ego4D Moment Queries Challenge 2022
Jiayi Shao, Xiaohan Wang, Yi Yang

TL;DR
This paper introduces a transformer-based method with segment-level recurrence for localizing activities in long egocentric videos, achieving top performance in the Ego4D Moment Queries Challenge 2022.
Contribution
It proposes a novel segment-level recurrence mechanism to improve long-term dependency modeling in transformer-based temporal action localization.
Findings
Achieved Recall@1,tIoU=0.5 of 37.24
Attained average mAP of 17.67
Secured 3rd place in the challenge leaderboard
Abstract
In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2022. In this task, the goal is to retrieve and localize all instances of possible activities in egocentric videos. Ego4D dataset is challenging for the temporal action localization task as the temporal duration of the videos is quite long and each video contains multiple action instances with fine-grained action classes. To address these problems, we utilize a multi-scale transformer to classify different action categories and predict the boundary of each instance. Moreover, in order to better capture the long-term temporal dependencies in the long videos, we propose a segment-level recurrence mechanism. Compared with directly feeding all video features to the transformer encoder, the proposed segment-level recurrence mechanism alleviates the optimization difficulties and achieves better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
