Egocentric Video Task Translation @ Ego4D Challenge 2022
Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

TL;DR
This paper introduces EgoTask Translation, a method leveraging related task models to improve primary egocentric video task performance, achieving top rankings in Ego4D challenges without modifying baseline architectures.
Contribution
The paper presents a novel task translator that learns to translate auxiliary task features to enhance primary task performance in egocentric video analysis.
Findings
Achieved 1st place in talking to me challenge
Secured 3rd place in PNR keyframe localization challenge
Demonstrated competitive performance without modifying baseline models
Abstract
This technical report describes the EgoTask Translation approach that explores relations among a set of egocentric video tasks in the Ego4D challenge. To improve the primary task of interest, we propose to leverage existing models developed for other related tasks and design a task translator that learns to ''translate'' auxiliary task features to the primary task. With no modification to the baseline architectures, our proposed approach achieves competitive performance on two Ego4D challenges, ranking the 1st in the talking to me challenge and the 3rd in the PNR keyframe localization challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
