A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge
Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng,, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

TL;DR
This paper presents a spoken semantic parsing system combining pipeline and end-to-end approaches, utilizing advanced ASR and language models to achieve high accuracy in a SLU challenge.
Contribution
It introduces a novel integration of pipeline and E2E SLU systems with model output combination, achieving top performance in the ICASSP 2023 challenge.
Findings
Achieved an exact match accuracy of 80.8%
Won 1st place at the ICASSP Signal Processing Grand Challenge
Demonstrated effectiveness of combining ASR and pretrained language models
Abstract
Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing. In this paper, we describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge which is part of ICASSP Signal Processing Grand Challenge 2023. We experiment with both end-to-end and pipeline systems for this task. Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance. We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Adam · Layer Normalization · Linear Layer · Dropout · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection
