Sequential Routing Framework: Fully Capsule Network-based Speech   Recognition

Kyungmin Lee; Hyunwhan Joe; Hyeontaek Lim; Kwangyoun Kim; Sungsoo Kim,; Chang Woo Han; Hong-Gee Kim

arXiv:2007.11747·eess.AS·April 2, 2021

Sequential Routing Framework: Fully Capsule Network-based Speech Recognition

Kyungmin Lee, Hyunwhan Joe, Hyeontaek Lim, Kwangyoun Kim, Sungsoo Kim,, Chang Woo Han, Hong-Gee Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel fully Capsule Network-based framework for speech recognition that employs sequential routing and a dynamic routing algorithm, achieving lower error rates compared to traditional models.

Contribution

It is the first to adapt a CapsNet-only structure to sequence-to-sequence speech recognition with a new sequential routing method and a non-iterative routing algorithm.

Findings

01

Achieves 1.1% lower WER on WSJ corpus.

02

Attains 0.7% lower PER on TIMIT.

03

Reduces decoding speed degradation with the proposed routing.

Abstract

Capsule networks (CapsNets) have recently gotten attention as a novel neural architecture. This paper presents the sequential routing framework which we believe is the first method to adapt a CapsNet-only structure to sequence-to-sequence recognition. Input sequences are capsulized then sliced by a window size. Each slice is classified to a label at the corresponding time through iterative routing mechanisms. Afterwards, losses are computed by connectionist temporal classification (CTC). During routing, the required number of parameters can be controlled by the window size regardless of the length of sequences by sharing learnable weights across the slices. We additionally propose a sequential dynamic routing algorithm to replace traditional dynamic routing. The proposed technique can minimize decoding speed degradation caused by the routing iterations since it can operate in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sephiroce/srf
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing