T-RECS: Training for Rate-Invariant Embeddings by Controlling Speed for Action Recognition
Madan Ravi Ganesh, Eric Hofesmann, Byungsu Min, Nadha Gafoor, Jason, J. Corso

TL;DR
This paper introduces T-RECS, a preprocessing technique that enhances action recognition models by making them invariant to input video speed variations, improving accuracy and stability across diverse datasets.
Contribution
The paper proposes T-RECS, a speed-adaptive input resampling method that improves rate-invariance in deep action recognition models, applicable to multiple architectures.
Findings
T-RECS improves I3D model performance by at least 2.9% on HMDB51.
T-RECS increases stability of C3D models by 59% on HMDB51.
T-RECS is model-agnostic and effective across different architectures.
Abstract
An action should remain identifiable when modifying its speed: consider the contrast between an expert chef and a novice chef each chopping an onion. Here, we expect the novice chef to have a relatively measured and slow approach to chopping when compared to the expert. In general, the speed at which actions are performed, whether slower or faster than average, should not dictate how they are recognized. We explore the erratic behavior caused by this phenomena on state-of-the-art deep network-based methods for action recognition in terms of maximum performance and stability in recognition accuracy across a range of input video speeds. By observing the trends in these metrics and summarizing them based on expected temporal behaviour w.r.t. variations in input video speeds, we find two distinct types of network architectures. In this paper, we propose a preprocessing method named T-RECS,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
