Bottleneck Low-rank Transformers for Low-resource Spoken Language Understanding
Pu Wang, Hugo Van hamme

TL;DR
This paper introduces a compact low-rank transformer architecture with attention bottlenecks, enabling efficient spoken language understanding in low-resource scenarios without pretraining, matching large model accuracy.
Contribution
It proposes a novel lean transformer structure with attention bottlenecks and group sparsity, reducing model size while maintaining performance in low-resource SLU tasks.
Findings
Achieves competitive accuracy without pretraining
Reduces model size significantly
Effective in low-resource settings
Abstract
End-to-end spoken language understanding (SLU) systems benefit from pretraining on large corpora, followed by fine-tuning on application-specific data. The resulting models are too large for on-edge applications. For instance, BERT-based systems contain over 110M parameters. Observing the model is overparameterized, we propose lean transformer structure where the dimension of the attention mechanism is automatically reduced using group sparsity. We propose a variant where the learned attention subspace is transferred to an attention bottleneck layer. In a low-resource setting and without pre-training, the resulting compact SLU model achieves accuracies competitive with pre-trained large models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech Recognition and Synthesis · Topic Modeling
