Task Graph Maximum Likelihood Estimation for Procedural Activity   Understanding in Egocentric Videos

Luigi Seminara; Giovanni Maria Farinella; Antonino Furnari

arXiv:2502.17753·cs.CV·February 27, 2025

Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos

Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

PDF

Open Access 1 Repo

TL;DR

This paper presents a gradient-based method for learning task graphs from procedural activities in egocentric videos, significantly improving understanding accuracy and enabling neural integration.

Contribution

It introduces a novel maximum likelihood optimization approach for task graph learning, surpassing hand-crafted methods and demonstrating strong results on multiple benchmarks.

Findings

01

Achieved +14.5% F1-score on CaptainCook4D

02

Improved online mistake detection by +19.8% on Assembly101-O

03

Top performance on Ego-Exo4D benchmark

Abstract

We introduce a gradient-based approach for learning task graphs from procedural activities, improving over hand-crafted methods. Our method directly optimizes edge weights via maximum likelihood, enabling integration into neural architectures. We validate our approach on CaptainCook4D, EgoPER, and EgoProceL, achieving +14.5%, +10.2%, and +13.6% F1-score improvements. Our feature-based approach for predicting task graphs from textual/video embeddings demonstrates emerging video understanding abilities. We also achieved top performance on the procedure understanding benchmark on Ego-Exo4D and significantly improved online mistake detection (+19.8% on Assembly101-O, +6.4% on EPIC-Tent-O). Code: https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fpv-iplab/differentiable-task-graph-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Graph Neural Networks