DCASE 2018 Challenge: Solution for Task 5
Jeremy Chew, Yingxiang Sun, Lahiru Jayasinghe, Chau Yuen

TL;DR
This paper presents an ensemble learning system combining CNN and LSTM models for classifying domestic activities, achieving high accuracy and outperforming baseline methods in the DCASE 2018 challenge.
Contribution
The paper introduces a novel ensemble approach using CNN and LSTM models with multi-channel features for improved activity classification.
Findings
F1-score of 92.19% on the development dataset
7.69% improvement over baseline performance
Effective classification of domestic activities
Abstract
To address Task 5 in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 challenge, in this paper, we propose an ensemble learning system. The proposed system consists of three different models, based on convolutional neural network and long short memory recurrent neural network. With extracted features such as spectrogram and mel-frequency cepstrum coefficients from different channels, the proposed system can classify different domestic activities effectively. Experimental results obtained from the provided development dataset show that good performance with F1-score of 92.19% can be achieved. Compared with the baseline system, our proposed system significantly improves the performance of F1-score by 7.69%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
