Improving Viewpoint-Invariance and Temporal Consistency for Action Detection

Yannick Porto; Renato Martins; Thomas Chalumeau; Cedric Demonceaux

arXiv:2605.22695·cs.CV·May 22, 2026

Improving Viewpoint-Invariance and Temporal Consistency for Action Detection

Yannick Porto, Renato Martins, Thomas Chalumeau, Cedric Demonceaux

PDF

1 Repo

TL;DR

This paper proposes a two-stage action detection method that enhances viewpoint invariance and temporal consistency in untrimmed videos by using augmented virtual viewpoints and a multi-scale temporal encoder.

Contribution

It introduces a novel training strategy with virtual viewpoint augmentation and a view-invariant temporal encoder for improved action detection.

Findings

01

Significantly outperforms state-of-the-art methods on PKU-MMD and BABEL benchmarks.

02

Effectively models fine-grained temporal relationships across motion windows.

03

Enhances viewpoint invariance in action detection.

Abstract

Viewpoint change invariance and action temporal consistency are critical aspects for the effective deployment of human action detection of untrimmed videos. Existing appearance-based video detection methods often struggle with limited viewpoint diversity during training, while motion-based detection approaches frequently fail to model fine-grained temporal relationships across consecutive motion windows. This paper introduces a novel two-stage action detection approach designed to improve both view-invariance and global temporal coherence properties. In the first stage, we extract motion features from augmented virtual viewpoints, solely used at training. Then, the second stage introduces a new view-invariant, multi-scale temporal encoder based on selective state-space sequence modelling to aggregate information across viewpoints and time scales. Experiments on PKU-MMD and BABEL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://icb-vision-ai.github.io/HydraView-TAD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.