EgoPrompt: Prompt Learning for Egocentric Action Recognition

Huaihai Lyu; Chaofan Chen; Yuheng Ji; Changsheng Xu

arXiv:2508.03266·cs.CV·August 8, 2025

EgoPrompt: Prompt Learning for Egocentric Action Recognition

Huaihai Lyu, Chaofan Chen, Yuheng Ji, Changsheng Xu

PDF

TL;DR

EgoPrompt introduces a prompt learning framework that models the semantic and contextual relationships between verbs and nouns in egocentric action recognition, achieving state-of-the-art results across multiple datasets.

Contribution

The paper proposes a novel prompt learning approach with a Unified Prompt Pool and Diverse Pool Criteria to improve the integration of component-specific knowledge in egocentric action recognition.

Findings

01

Achieves state-of-the-art performance on Ego4D, EPIC-Kitchens, and EGTEA datasets.

02

Effectively models cross-component interactions via attention-based fusion.

03

Demonstrates strong generalization in cross-dataset and base-to-novel scenarios.

Abstract

Driven by the increasing demand for applications in augmented and virtual reality, egocentric action recognition has emerged as a prominent research area. It is typically divided into two subtasks: recognizing the performed behavior (i.e., verb component) and identifying the objects being acted upon (i.e., noun component) from the first-person perspective. However, most existing approaches treat these two components as independent classification tasks, focusing on extracting component-specific knowledge while overlooking their inherent semantic and contextual relationships, leading to fragmented representations and sub-optimal generalization capability. To address these challenges, we propose a prompt learning-based framework, EgoPrompt, to conduct the egocentric action recognition task. Building on the existing prompting strategy to capture the component-specific knowledge, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.