LibriSQA: A Novel Dataset and Framework for Spoken Question Answering   with Large Language Models

Zihan Zhao; Yiyang Jiang; Heyang Liu; Yanfeng Wang; Yu Wang

arXiv:2308.10390·cs.CL·April 19, 2024

LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models

Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces LibriSQA, a new dataset and framework for spoken question answering using large language models, demonstrating improved multimodal understanding and alignment between speech and text.

Contribution

The paper presents a novel LibriSQA dataset with 107k pairs and a lightweight end-to-end framework for SQA, advancing multimodal LLM capabilities.

Findings

01

Framework achieves significant results in SQA tasks.

02

Reformulating ASR as SQA enhances speech-text understanding.

03

Empirical evidence shows improved multimodal alignment.

Abstract

While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features. To address the SQA challenge on LLMs, we initially curated the free-form and open-ended LibriSQA dataset from Librispeech, comprising Part I with natural conversational formats and Part II encompassing multiple-choice questions followed by answers and analytical segments. Both parts collectively include 107k SQA pairs that cover various topics. Given the evident paucity of existing speech-text LLMs, we propose a lightweight, end-to-end framework to execute the SQA task on the LibriSQA, witnessing significant results. By reforming ASR into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zihanzhaosjtu/librisqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems