Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos
Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

TL;DR
This paper presents a gradient-based method for learning task graphs from procedural activities in egocentric videos, significantly improving understanding accuracy and enabling neural integration.
Contribution
It introduces a novel maximum likelihood optimization approach for task graph learning, surpassing hand-crafted methods and demonstrating strong results on multiple benchmarks.
Findings
Achieved +14.5% F1-score on CaptainCook4D
Improved online mistake detection by +19.8% on Assembly101-O
Top performance on Ego-Exo4D benchmark
Abstract
We introduce a gradient-based approach for learning task graphs from procedural activities, improving over hand-crafted methods. Our method directly optimizes edge weights via maximum likelihood, enabling integration into neural architectures. We validate our approach on CaptainCook4D, EgoPER, and EgoProceL, achieving +14.5%, +10.2%, and +13.6% F1-score improvements. Our feature-based approach for predicting task graphs from textual/video embeddings demonstrates emerging video understanding abilities. We also achieved top performance on the procedure understanding benchmark on Ego-Exo4D and significantly improved online mistake detection (+19.8% on Assembly101-O, +6.4% on EPIC-Tent-O). Code: https://github.com/fpv-iplab/Differentiable-Task-Graph-Learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
