Combination of Time-domain, Frequency-domain, and Cepstral-domain   Acoustic Features for Speech Commands Classification

Yikang Wang; Hiromitsu Nishizaki

arXiv:2203.16085·cs.SD·June 20, 2022·1 cites

Combination of Time-domain, Frequency-domain, and Cepstral-domain Acoustic Features for Speech Commands Classification

Yikang Wang, Hiromitsu Nishizaki

PDF

Open Access

TL;DR

This paper explores combining time-domain, frequency-domain, and cepstral-domain features, including a novel BSR-float16, to enhance speech command classification accuracy and noise robustness.

Contribution

It introduces BSR-float16, a more precise time-domain feature, and demonstrates the effectiveness of feature fusion for improved classification and noise robustness.

Findings

01

Fusion of features improves classification accuracy.

02

BSR-float16 outperforms previous BSR in precision.

03

Feature combination enhances noise robustness.

Abstract

In speech-related classification tasks, frequency-domain acoustic features such as logarithmic Mel-filter bank coefficients (FBANK) and cepstral-domain acoustic features such as Mel-frequency cepstral coefficients (MFCC) are often used. However, time-domain features perform more effectively in some sound classification tasks which contain non-vocal or weakly speech-related sounds. We previously proposed a feature called bit sequence representation (BSR), which is a time-domain binary acoustic feature based on the raw waveform. Compared with MFCC, BSR performed better in environmental sound detection and showed comparable accuracy performance in limited-vocabulary speech recognition tasks. In this paper, we propose a novel improvement BSR feature called BSR-float16 to represent floating-point values more precisely. We experimentally demonstrated the complementarity among time-domain,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing