WaBERT: A Low-resource End-to-end Model for Spoken Language Understanding and Speech-to-BERT Alignment
Lin Yao, Jianfei Song, Ruizhuo Xu, Yingfang Yang, Zijian Chen and, Yafeng Deng

TL;DR
WaBERT is an end-to-end speech-language understanding model that integrates speech and language models, improving performance on sentiment analysis tasks with low-resource training and monotonic speech-text alignment.
Contribution
The paper introduces WaBERT, a novel end-to-end model combining pre-trained speech and language models with a modified CIF mechanism for better SLU performance.
Findings
Improved recall and F1 scores on SLUE SA tasks.
Effective integration of audio-specific and language knowledge.
Monotonic alignment between speech and text achieved.
Abstract
Historically lower-level tasks such as automatic speech recognition (ASR) and speaker identification are the main focus in the speech field. Interest has been growing in higher-level spoken language understanding (SLU) tasks recently, like sentiment analysis (SA). However, improving performances on SLU tasks remains a big challenge. Basically, there are two main methods for SLU tasks: (1) Two-stage method, which uses a speech model to transfer speech to text, then uses a language model to get the results of downstream tasks; (2) One-stage method, which just fine-tunes a pre-trained speech model to fit in the downstream tasks. The first method loses emotional cues such as intonation, and causes recognition errors during ASR process, and the second one lacks necessary language knowledge. In this paper, we propose the Wave BERT (WaBERT), a novel end-to-end model combining the speech model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Sentiment Analysis and Opinion Mining
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Layer Normalization · Residual Connection · Dense Connections · Attention Dropout · Softmax · WordPiece
