Differentiable Task Graph Learning: Procedural Activity Representation   and Online Mistake Detection from Egocentric Videos

Luigi Seminara; Giovanni Maria Farinella; Antonino Furnari

arXiv:2406.01486·cs.CV·January 10, 2025

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos

Luigi Seminara, Giovanni Maria Farinella, Antonino Furnari

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a differentiable, neural network-compatible method for learning task graphs from egocentric videos, improving procedural activity understanding and online mistake detection accuracy.

Contribution

It presents a novel gradient-based approach for learning task graphs directly from video data, enabling better procedural activity modeling and mistake detection.

Findings

01

Achieved +16.7% accuracy in task graph prediction over previous methods.

02

Enhanced online mistake detection with +19.8% and +7.5% improvements on two datasets.

03

Demonstrated emerging video understanding abilities from textual and video embeddings.

Abstract

Procedural activities are sequences of key-steps aimed at achieving specific goals. They are crucial to build intelligent agents able to assist users effectively. In this context, task graphs have emerged as a human-understandable representation of procedural activities, encoding a partial ordering over the key-steps. While previous works generally relied on hand-crafted procedures to extract task graphs from videos, in this paper, we propose an approach based on direct maximum likelihood optimization of edges' weights, which allows gradient-based learning of task graphs and can be naturally plugged into neural network architectures. Experiments on the CaptainCook4D dataset demonstrate the ability of our approach to predict accurate task graphs from the observation of action sequences, with an improvement of +16.7% over previous approaches. Owing to the differentiability of the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fpv-iplab/differentiable-task-graph-learning
pytorchOfficial

Videos

Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos· slideslive

Taxonomy

TopicsAdvanced Graph Neural Networks · Online Learning and Analytics · Human Pose and Action Recognition