LLaSM: Large Language and Speech Model

Yu Shu; Siwei Dong; Guangyao Chen; Wenhao Huang; Ruihua Zhang; Daochen; Shi; Qiqi Xiang; Yemin Shi

arXiv:2308.15930·cs.CL·September 19, 2023·5 cites

LLaSM: Large Language and Speech Model

Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen, Shi, Qiqi Xiang, Yemin Shi

PDF

Open Access 1 Repo

TL;DR

LLaSM introduces a multi-modal speech and language model capable of following complex instructions, emphasizing speech as a vital modality for human-AI interaction, supported by a new dataset and initial experiments.

Contribution

It presents the first end-to-end trained large speech-language model with cross-modal conversational abilities and releases a new speech instruction dataset.

Findings

01

LLaSM demonstrates effective speech-language instruction following.

02

The model offers a more natural interaction method.

03

Early experiments show promising performance.

Abstract

Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to follow multi-modal speech-and-language instructions. In this work, we propose Large Language and Speech Model (LLaSM). LLaSM is an end-to-end trained large multi-modal speech-language model with cross-modal conversational abilities, capable of following speech-and-language instructions. Our early experiments show that LLaSM demonstrates a more convenient and natural way for humans to interact with artificial intelligence. Specifically, we also release a large Speech Instruction Following dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

linksoul-ai/llasm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems

MethodsFocus