Video Test-Time Adaptation for Action Recognition

Wei Lin; Muhammad Jehanzeb Mirza; Mateusz Kozinski; Horst Possegger,; Hilde Kuehne; Horst Bischof

arXiv:2211.15393·cs.CV·March 22, 2023

Video Test-Time Adaptation for Action Recognition

Wei Lin, Muhammad Jehanzeb Mirza, Mateusz Kozinski, Horst Possegger,, Hilde Kuehne, Horst Bischof

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a test-time adaptation method for video action recognition that aligns feature distributions and enforces prediction consistency, significantly improving performance under distribution shifts.

Contribution

It presents a novel, architecture-agnostic test-time adaptation technique tailored for spatio-temporal models in video action recognition.

Findings

01

Significant performance boost on benchmark datasets.

02

Effective on both convolutional and transformer architectures.

03

Outperforms existing test-time adaptation methods.

Abstract

Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wlin-at/vitta
pytorchOfficial

Datasets

wlin21at/ViTTA
dataset· 117 dl
117 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Test · Stochastic Depth · Layer Normalization · Softmax · Adam · Dropout · Byte Pair Encoding · Swin Transformer · Position-Wise Feed-Forward Layer