Egocentric Video Task Translation
Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

TL;DR
EgoTask Translation (EgoT2) is a unified framework that translates outputs of models trained on different egocentric video tasks, improving performance across multiple tasks by capturing synergies and mitigating competition.
Contribution
EgoT2 introduces a novel task translation approach with separate backbones and a shared translator, enabling multi-task learning and transfer in egocentric video understanding.
Findings
Achieves top-ranked results on four Ego4D benchmark challenges.
Outperforms existing transfer paradigms in egocentric video tasks.
Demonstrates effectiveness across diverse video understanding tasks.
Abstract
Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
