HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

Simone Alberto Peirone; Francesca Pistilli; Giuseppe Averta

arXiv:2505.12911·cs.CV·May 20, 2025

HiERO: understanding the hierarchy of human behavior enhances reasoning on egocentric videos

Simone Alberto Peirone, Francesca Pistilli, Giuseppe Averta

PDF

Open Access 1 Repo

TL;DR

HiERO introduces a hierarchical approach to understanding human activities in egocentric videos, leveraging weak supervision and aligning video segments with descriptions to improve reasoning and achieve state-of-the-art results.

Contribution

The paper presents HiERO, a novel weakly-supervised hierarchical method that enhances video features with activity structures, improving reasoning in egocentric videos with minimal supervision.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Outperforms fully-supervised methods in zero-shot procedure learning.

03

Demonstrates the importance of activity hierarchy in egocentric vision.

Abstract

Human activities are particularly complex and variable, and this makes challenging for deep learning models to reason about them. However, we note that such variability does have an underlying structure, composed of a hierarchy of patterns of related actions. We argue that such structure can emerge naturally from unscripted videos of human activities, and can be leveraged to better reason about their content. We present HiERO, a weakly-supervised method to enrich video segments features with the corresponding hierarchical activity threads. By aligning video clips with their narrated descriptions, HiERO infers contextual, semantic and temporal reasoning with an hierarchical architecture. We prove the potential of our enriched features with multiple video-text alignment benchmarks (EgoMCQ, EgoNLQ) with minimal additional training, and in zero-shot for procedure learning tasks (EgoProceL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sapeirone/hiero
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis