Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Tatiana, Tommasi, Giuseppe Averta

TL;DR
Hier-EgoPack is a hierarchical framework that enhances egocentric video understanding by enabling reasoning across multiple temporal granularities, promoting efficient multi-task learning and knowledge sharing.
Contribution
It introduces a novel hierarchical architecture with a GNN layer for multi-granularity temporal reasoning, expanding EgoPack's capabilities for diverse downstream tasks.
Findings
Effective multi-task learning across diverse egocentric tasks
Improved reasoning at multiple temporal scales
Demonstrated superior performance on Ego4d benchmarks
Abstract
Our comprehension of video streams depicting human activities is naturally multifaceted: in just a few moments, we can grasp what is happening, identify the relevance and interactions of objects in the scene, and forecast what will happen soon, everything all at once. To endow autonomous systems with such a holistic perception, learning how to correlate concepts, abstract knowledge across diverse tasks, and leverage tasks synergies when learning novel skills is essential. A significant step in this direction is EgoPack, a unified framework for understanding human activities across diverse tasks with minimal overhead. EgoPack promotes information sharing and collaboration among downstream tasks, essential for efficiently learning new skills. In this paper, we introduce Hier-EgoPack, which advances EgoPack by enabling reasoning also across diverse temporal granularities, which expands its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIdentity, Memory, and Therapy
