Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

Qiaohui Chu; Haoyu Zhang; Meng Liu; Yisen Feng; Haoxiang Shi; Liqiang Nie

arXiv:2508.01742·cs.CV·November 18, 2025

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation

Qiaohui Chu, Haoyu Zhang, Meng Liu, Yisen Feng, Haoxiang Shi, Liqiang Nie

PDF

Open Access 1 Video

TL;DR

This paper introduces INSIGHT, a novel two-stage framework for egocentric long-term action anticipation that leverages semantic cues and explicit cognitive reasoning to improve prediction accuracy and generalization.

Contribution

It proposes a unified approach combining semantic feature extraction and reinforcement learning-based reasoning for better long-term action anticipation.

Findings

01

Achieves state-of-the-art results on Ego4D, EPIC-Kitchens-55, and EGTEA Gaze+ datasets.

02

Effectively utilizes hand-object interaction cues and verb-noun semantics.

03

Demonstrates strong generalization across diverse egocentric datasets.

Abstract

Long-term action anticipation from egocentric video is critical for applications such as human-computer interaction and assistive technologies, where anticipating user intent enables proactive and context-aware AI assistance. However, existing approaches suffer from three key limitations: 1) underutilization of fine-grained visual cues from hand-object interactions, 2) neglect of semantic dependencies between verbs and nouns, and 3) lack of explicit cognitive reasoning, limiting generalization and long-term forecasting ability. To overcome these challenges, we propose INSIGHT, a unified two-stage framework for egocentric action anticipation. In the first stage, INSIGHT focuses on extracting semantically rich features from hand-object interaction regions and enhances action representations using a verb-noun co-occurrence matrix. In the second stage, it introduces a reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Intention-Guided Cognitive Reasoning for Egocentric Long-Term Action Anticipation· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization