Domain Generalization using Action Sequences for Egocentric Action Recognition

Amirshayan Nasirimajd; Chiara Plizzari; Simone Alberto Peirone; Marco Ciccone; Giuseppe Averta; Barbara Caputo

arXiv:2506.17685·cs.CV·June 24, 2025

Domain Generalization using Action Sequences for Egocentric Action Recognition

Amirshayan Nasirimajd, Chiara Plizzari, Simone Alberto Peirone, Marco Ciccone, Giuseppe Averta, Barbara Caputo

PDF

TL;DR

This paper introduces SeqDG, a domain generalization method for egocentric action recognition that leverages action sequences and a visual-text reconstruction task to improve performance across unseen environments.

Contribution

The paper proposes SeqDG, a novel approach using sequence reconstruction and mixed domain training to enhance cross-domain generalization in egocentric action recognition.

Findings

01

Achieved +2.4% relative improvement on EPIC-KITCHENS-100 in cross-domain settings.

02

Improved intra-domain accuracy by +0.6% on EGTEA dataset.

03

Validated effectiveness of SeqDG across multiple egocentric datasets.

Abstract

Recognizing human activities from visual inputs, particularly through a first-person viewpoint, is essential for enabling robots to replicate human behavior. Egocentric vision, characterized by cameras worn by observers, captures diverse changes in illumination, viewpoint, and environment. This variability leads to a notable drop in the performance of Egocentric Action Recognition models when tested in environments not seen during training. In this paper, we tackle these challenges by proposing a domain generalization approach for Egocentric Action Recognition. Our insight is that action sequences often reflect consistent user intent across visual domains. By leveraging action sequences, we aim to enhance the model's generalization ability across unseen environments. Our proposed method, named SeqDG, introduces a visual-text sequence reconstruction objective (SeqRec) that uses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.