Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech   Detection

Pengfei Sun; Jun Qin

arXiv:1611.00326·cs.SD·April 24, 2017·5 cites

Enhanced Factored Three-Way Restricted Boltzmann Machines for Speech Detection

Pengfei Sun, Jun Qin

PDF

Open Access

TL;DR

This paper introduces an enhanced factored three-way restricted Boltzmann machine model for speech detection, leveraging conditional feature learning, long-term feature capture, and parameter reduction techniques to improve performance in noisy environments.

Contribution

The paper proposes a novel EFTW-RBM model with conditional feature learning and parameter reduction, outperforming existing speech detection algorithms in noisy conditions.

Findings

01

Outperforms existing algorithms in noisy environments.

02

Achieves higher AUC and SDR scores.

03

Effectively captures long-term speech features.

Abstract

In this letter, we propose enhanced factored three way restricted Boltzmann machines (EFTW-RBMs) for speech detection. The proposed model incorporates conditional feature learning by multiplying the dynamical state of the third unit, which allows a modulation over the visible-hidden node pairs. Instead of stacking previous frames of speech as the third unit in a recursive manner, the correlation related weighting coefficients are assigned to the contextual neighboring frames. Specifically, a threshold function is designed to capture the long-term features and blend the globally stored speech structure. A factored low rank approximation is introduced to reduce the parameters of the three-dimensional interaction tensor, on which non-negative constraint is imposed to address the sparsity characteristic. The validations through the area-under-ROC-curve (AUC) and signal distortion ratio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing