Audio-Based Activities of Daily Living (ADL) Recognition with   Large-Scale Acoustic Embeddings from Online Videos

Dawei Liang; Edison Thomaz

arXiv:1810.08691·cs.HC·April 9, 2019

Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos

Dawei Liang, Edison Thomaz

PDF

1 Repo

TL;DR

This paper presents a novel audio-based activity recognition framework using large-scale acoustic embeddings from online videos, achieving promising accuracy in recognizing daily activities without extensive data annotation.

Contribution

The study introduces a scalable approach leveraging public video embeddings and deep learning for activity recognition, reducing the need for manual audio data annotation.

Findings

01

Achieved 64.2% top-1 accuracy in ADL recognition

02

Achieved 83.6% top-3 accuracy in ADL recognition

03

Demonstrated robustness and co-occurrence analysis of activities

Abstract

Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scaled multi-media datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that makes use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dawei-liang/AudioAR_Research_Codes
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.