Formant Tracking Using Dilated Convolutional Networks Through Dense   Connection with Gating Mechanism

Wang Dai; Jinsong Zhang; Yingming Gao; Wei Wei; Dengfeng Ke; Binghuai; Lin; Yanlu Xie

arXiv:2005.10803·eess.AS·August 11, 2020·1 cites

Formant Tracking Using Dilated Convolutional Networks Through Dense Connection with Gating Mechanism

Wang Dai, Jinsong Zhang, Yingming Gao, Wei Wei, Dengfeng Ke, Binghuai, Lin, Yanlu Xie

PDF

Open Access

TL;DR

This paper introduces a modified dilated convolutional network with dense connections and gating for improved formant tracking in speech processing, outperforming traditional recurrent models on a standard dataset.

Contribution

The study proposes a novel TCN architecture with dense connections and gating mechanisms, tailored for formant tracking, demonstrating superior performance over existing models.

Findings

01

Achieved an 8.2% MAPE on formant tracking

02

Model converges easily and outperforms LSTM and Bi-LSTM baselines

03

Effective use of dense connections and gating in TCN for speech tasks

Abstract

Formant tracking is one of the most fundamental problems in speech processing. Traditionally, formants are estimated using signal processing methods. Recent studies showed that generic convolutional architectures can outperform recurrent networks on temporal tasks such as speech synthesis and machine translation. In this paper, we explored the use of Temporal Convolutional Network (TCN) for formant tracking. In addition to the conventional implementation, we modified the architecture from three aspects. First, we turned off the "causal" mode of dilated convolution, making the dilated convolution see the future speech frames. Second, each hidden layer reused the output information from all the previous layers through dense connection. Third, we also adopted a gating mechanism to alleviate the problem of gradient disappearance by selectively forgetting unimportant information. The model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing