Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency

Zhaofeng Shi; Heqian Qiu; Lanxiao Wang; Qingbo Wu; Fanman Meng; Lili Pan; Hongliang Li

arXiv:2603.09798·cs.CV·March 25, 2026

Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency

Zhaofeng Shi, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Lili Pan, Hongliang Li

PDF

Open Access

TL;DR

This paper introduces a novel test-time adaptation method for action anticipation across egocentric and exocentric views, utilizing multi-label prototype growing and dual-clue consistency to improve performance without extensive target-view training.

Contribution

The paper proposes the first test-time ego-exo adaptation framework with a dual-clue prototype growing network for effective action anticipation, addressing multi-label and cross-modality challenges.

Findings

01

Outperforms state-of-the-art methods on EgoMe-anti and EgoExoLearn benchmarks.

02

Effectively adapts online during testing without additional target-view training.

03

Utilizes multi-label knowledge and textual-visual clues for improved action anticipation.

Abstract

Efficient adaptation between Egocentric (Ego) and Exocentric (Exo) views is crucial for applications such as human-robot cooperation. However, the success of most existing Ego-Exo adaptation methods relies heavily on target-view data for training, thereby increasing computational and data collection costs. In this paper, we make the first exploration of a Test-time Ego-Exo Adaptation for Action Anticipation (TE $^{2}$ A $^{3}$ ) task, which aims to adjust the source-view-trained model online during test time to anticipate target-view actions. It is challenging for existing Test-Time Adaptation (TTA) methods to address this task due to the multi-action candidates and significant temporal-spatial inter-view gap. Hence, we propose a novel Dual-Clue enhanced Prototype Growing Network (DCPGN), which accumulates multi-label knowledge and integrates cross-modality clues for effective test-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis