Improving Transducer-Based Spoken Language Understanding with Self-Conditioned CTC and Knowledge Transfer
Vishal Sunder, Eric Fosler-Lussier

TL;DR
This paper introduces a novel RNN transducer model for spoken language understanding that combines self-conditioned CTC and knowledge transfer from BERT, significantly enhancing SLU performance with fewer parameters.
Contribution
It proposes a joint modeling approach integrating self-conditioned CTC with BERT-based knowledge transfer for improved end-to-end SLU.
Findings
Significant SLU performance improvement over baselines.
Achieves comparable results to large models like Whisper.
Reduces model size while maintaining high accuracy.
Abstract
In this paper, we propose to improve end-to-end (E2E) spoken language understand (SLU) in an RNN transducer model (RNN-T) by incorporating a joint self-conditioned CTC automatic speech recognition (ASR) objective. Our proposed model is akin to an E2E differentiable cascaded model which performs ASR and SLU sequentially and we ensure that the SLU task is conditioned on the ASR task by having CTC self conditioning. This novel joint modeling of ASR and SLU improves SLU performance significantly over just using SLU optimization. We further improve the performance by aligning the acoustic embeddings of this model with the semantically richer BERT model. Our proposed knowledge transfer strategy makes use of a bag-of-entity prediction layer on the aligned embeddings and the output of this is used to condition the RNN-T based SLU decoding. These techniques show significant improvement over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Layer Normalization · Dense Connections · Attention Dropout · WordPiece · Dropout · Linear Layer · Softmax · Linear Warmup With Linear Decay
