HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos
Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

TL;DR
HiERO introduces a hierarchical approach to understanding human activities in egocentric videos, leveraging weak supervision and aligning video segments with descriptions to improve reasoning and achieve state-of-the-art results.
Contribution
The paper presents HiERO, a novel weakly-supervised hierarchical method that enhances video features with activity structures, improving reasoning in egocentric videos with minimal supervision.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Outperforms fully-supervised methods in zero-shot procedure learning.
Demonstrates the importance of activity hierarchy in egocentric vision.
Abstract
Human activities are particularly complex and variable, and this makes challenging for deep learning models to reason about them. However, we note that such variability does have an underlying structure, composed of a hierarchy of patterns of related actions. We argue that such structure can emerge naturally from unscripted videos of human activities, and can be leveraged to better reason about their content. We present HiERO, a weakly-supervised method to enrich video segments features with the corresponding hierarchical activity threads. By aligning video clips with their narrated descriptions, HiERO infers contextual, semantic and temporal reasoning with an hierarchical architecture. We prove the potential of our enriched features with multiple video-text alignment benchmarks (EgoMCQ, EgoNLQ) with minimal additional training, and in zero-shot for procedure learning tasks (EgoProceL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
