TL;DR
This paper presents a novel audio-based activity recognition framework using large-scale acoustic embeddings from online videos, achieving promising accuracy in recognizing daily activities without extensive data annotation.
Contribution
The study introduces a scalable approach leveraging public video embeddings and deep learning for activity recognition, reducing the need for manual audio data annotation.
Findings
Achieved 64.2% top-1 accuracy in ADL recognition
Achieved 83.6% top-3 accuracy in ADL recognition
Demonstrated robustness and co-occurrence analysis of activities
Abstract
Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
