Joint Automatic Speech Recognition And Structure Learning For Better   Speech Understanding

Jiliang Hu; Zuchao Li; Mengjia Shen; Haojun Ai; Sheng Li; Jun Zhang

arXiv:2501.07329·cs.SD·January 20, 2025

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

PDF

1 Repo 1 Models

TL;DR

This paper introduces a joint speech recognition and structure learning framework that improves speech understanding by simultaneously transcribing speech and extracting structured content, outperforming traditional methods on multiple datasets.

Contribution

The paper presents a novel end-to-end model for joint speech recognition and structure learning, enabling simultaneous transcription and content extraction with superior performance.

Findings

01

Outperforms traditional sequence-to-sequence methods in transcription accuracy

02

Achieves state-of-the-art results on AISHELL-NER and SLURP datasets

03

Effectively extracts structured content during speech recognition

Abstract

Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

193746/jsrsl
pytorchOfficial

Models

🤗
Rinawell/JSRSL
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.