Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling
Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed

TL;DR
This paper introduces a novel variational approach to connectionist temporal classification (CTC), enabling order-preserving sequence modeling with improved generalization by integrating variational latent variables.
Contribution
It develops two new variational CTC loss functions based on independence and Markov assumptions, extending CTC to variational models for better sequence modeling.
Findings
Derived two variational CTC loss functions for order-preserving sequence modeling.
Provided computationally tractable forms for the proposed loss functions.
Demonstrated the theoretical feasibility of integrating variational models with CTC.
Abstract
Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
