Audio-Adaptive Activity Recognition Across Video Domains

Yunhua Zhang; Hazel Doughty; Ling Shao; Cees G. M. Snoek

arXiv:2203.14240·cs.CV·March 30, 2022

Audio-Adaptive Activity Recognition Across Video Domains

Yunhua Zhang, Hazel Doughty, Ling Shao, Cees G. M. Snoek

PDF

Open Access 1 Repo

TL;DR

This paper introduces an audio-adaptive approach for activity recognition that leverages activity sounds to improve domain adaptation across different video settings, addressing challenges like actor and scenery changes.

Contribution

It proposes a novel audio-adaptive encoder and an audio-infused recognizer to enhance cross-domain activity recognition by utilizing less variable activity sounds.

Findings

01

Effective in reducing domain shift in activity recognition

02

Improves recognition accuracy on new datasets and scenarios

03

Addresses actor shift with a new dataset

Abstract

This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening. We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. To further eliminate domain-specific features and include domain-invariant activity sounds for recognition, an audio-infused recognizer is proposed, which effectively models the cross-modal interaction across domains. We also introduce the new task of actor shift, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiaobai1217/DomainAdaptation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Human Pose and Action Recognition