SparQLe: Speech Queries to Text Translation Through LLMs

Amirbek Djanibekov; Hanan Aldarmaki

arXiv:2502.09284·cs.CL·June 2, 2025

SparQLe: Speech Queries to Text Translation Through LLMs

Amirbek Djanibekov, Hanan Aldarmaki

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper presents SparQLe, a novel method that combines self-supervised speech representations with instruction-tuned LLMs via a modality adapter, enabling effective speech-to-text translation and semantic preservation.

Contribution

It introduces a new approach that aligns speech features with instruction-tuned LLMs, enhancing speech understanding and translation capabilities.

Findings

01

Effective preservation of semantic content in speech-to-text translation

02

Successful integration of self-supervised speech models with instruction-tuned LLMs

03

Potential for improved multi-modal speech understanding applications

Abstract

With the growing influence of Large Language Models (LLMs), there is increasing interest in integrating speech representations with them to enable more seamless multi-modal processing and speech understanding. This study introduces a novel approach that combines self-supervised speech representations with instruction-tuned LLMs for speech-to-text translation. The proposed approach leverages a modality adapter to align extracted speech features with instruction-tuned LLMs using English speech data. Our experiments demonstrate that this method effectively preserves the semantic content of the input speech and serves as an effective bridge between self-supervised speech models and instruction-tuned LLMs, offering a promising approach for various speech understanding applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

djanibekov/rebooting-llm
pytorchOfficial

Models

🤗
amupd/SparQLe
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Library Science and Information Systems · Mathematics, Computing, and Information Processing

MethodsAdapter · ALIGN