Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

Thomas Thebaud; Yen-Ju Lu; Matthew Wiesner; Peter Viechnicki; Najim Dehak

arXiv:2508.04795·cs.CL·September 10, 2025

Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

Thomas Thebaud, Yen-Ju Lu, Matthew Wiesner, Peter Viechnicki, Najim Dehak

PDF

TL;DR

This paper presents a method to enrich dialogue transcriptions with speaker metadata by leveraging frozen audio and language models, improving speaker profiling without fine-tuning and maintaining efficiency.

Contribution

It introduces a novel approach combining frozen audio and language models to infer speaker attributes in dialogues without task-specific fine-tuning.

Findings

01

Achieves competitive speaker profiling performance

02

Maintains modularity and speed in processing

03

Attains 8.8% EER in speaker comparison tasks

Abstract

In dialogue transcription pipelines, Large Language Models (LLMs) are frequently employed in post-processing to improve grammar, punctuation, and readability. We explore a complementary post-processing step: enriching transcribed dialogues by adding metadata tags for speaker characteristics such as age, gender, and emotion. Some of the tags are global to the entire dialogue, while some are time-variant. Our approach couples frozen audio foundation models, such as Whisper or WavLM, with a frozen LLAMA language model to infer these speaker attributes, without requiring task-specific fine-tuning of either model. Using lightweight, efficient connectors to bridge audio and language representations, we achieve competitive performance on speaker profiling tasks while preserving modularity and speed. Additionally, we demonstrate that a frozen LLAMA model can compare x-vectors directly,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.