Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS

Hagai Aronowitz; Zvi Kons; Avihu Dekel; George Saon; Ron Hoory

arXiv:2604.11269·eess.AS·April 14, 2026

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS

Hagai Aronowitz, Zvi Kons, Avihu Dekel, George Saon, Ron Hoory

PDF

5 Models

TL;DR

This paper enhances speech recognition by integrating speaker identity tags directly into transcripts using a speech-aware LLM, improving accuracy with minimal architectural changes and data augmentation.

Contribution

It introduces speaker cluster identification tags and a data augmentation method, enabling effective adaptation of speech-aware LLMs for speaker-attributed ASR.

Findings

01

Significant accuracy improvements over traditional pipelines.

02

Effective use of minimal architectural modifications.

03

Successful evaluation across multiple benchmarks.

Abstract

Speaker-Attributed Automatic Speech Recognition (SAA) enhances traditional ASR systems by incorporating relative speaker identity tags directly into the transcript (e.g., [Speaker 1]:, [Speaker 2]:). In this work, we extend the capabilities of Granite-speech, a state-of-the-art speech-aware Large Language Model (LLM) originally trained for transcription and translation. We demonstrate that it can be effectively adapted for SAA with only minimal architectural changes. Our core contribution is the introduction of speaker cluster identification tags (e.g., [Speaker 1 cluster 42]:) which are jointly trained with SAA to significantly improve accuracy. To address limitations in training data, we propose a data augmentation method that uses artificially concatenated multi-speaker conversations. Our approach is evaluated across multiple benchmarks and shows superior performance compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.