Keystep Recognition using Graph Neural Networks

Julia Lee Romero; Kyle Min; Subarna Tripathi; Morteza Karimzadeh

arXiv:2506.01102·cs.CV·February 11, 2026

Keystep Recognition using Graph Neural Networks

Julia Lee Romero, Kyle Min, Subarna Tripathi, Morteza Karimzadeh

PDF

Open Access

TL;DR

This paper introduces GLEVR, a graph neural network framework for keystep recognition in egocentric videos, leveraging long-term dependencies and multimodal data to outperform existing methods.

Contribution

The paper proposes a novel graph-learning framework, GLEVR, for fine-grained keystep recognition that effectively utilizes long-term dependencies and multimodal data in egocentric videos.

Findings

01

GLEVR outperforms existing models on the Ego-Exo4D dataset.

02

Constructed sparse graphs improve computational efficiency.

03

Alignment with exocentric videos enhances inference accuracy.

Abstract

We pose keystep recognition as a node classification task, and propose a flexible graph-learning framework for fine-grained keystep recognition that is able to effectively leverage long-term dependencies in egocentric videos. Our approach, termed GLEVR, consists of constructing a graph where each video clip of the egocentric video corresponds to a node. The constructed graphs are sparse and computationally efficient, outperforming existing larger models substantially. We further leverage alignment between egocentric and exocentric videos during training for improved inference on egocentric videos, as well as adding automatic captioning as an additional modality. We consider each clip of each exocentric video (if available) or video captions as additional nodes during training. We examine several strategies to define connections across these nodes. We perform extensive experiments on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Handwritten Text Recognition Techniques