CoLMbo: Speaker Language Model for Descriptive Profiling

Massa Baali; Shuo Han; Syed Abdul Hannan; Purusottam Samal; Karanveer Singh; Soham Deshmukh; Rita Singh; Bhiksha Raj

arXiv:2506.09375·cs.CL·August 26, 2025

CoLMbo: Speaker Language Model for Descriptive Profiling

Massa Baali, Shuo Han, Syed Abdul Hannan, Purusottam Samal, Karanveer Singh, Soham Deshmukh, Rita Singh, Bhiksha Raj

PDF

Open Access 1 Repo 1 Models

TL;DR

CoLMbo is a novel speaker language model that generates detailed, customizable speaker profiles and descriptions by integrating speaker embeddings with prompt-based conditioning, improving zero-shot speaker profiling across diverse datasets.

Contribution

Introduces CoLMbo, a speaker language model that combines speaker embeddings with prompt-based conditioning to produce detailed, customizable speaker descriptions and profiles.

Findings

01

Effective in zero-shot scenarios across diverse datasets

02

Generates detailed, customizable speaker profiles

03

Enhances traditional speaker recognition with descriptive capabilities

Abstract

Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that addresses these limitations by integrating a speaker encoder with prompt-based conditioning. This allows for the creation of detailed captions based on speaker embeddings. CoLMbo utilizes user-defined prompts to adapt dynamically to new speaker characteristics and provides customized descriptions, including regional dialect variations and age-related traits. This innovative approach not only enhances traditional speaker profiling but also excels in zero-shot scenarios across diverse datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

massabaali7/colmbo
pytorchOfficial

Models

🤗
cmu-mlsp/CoLMbo
model· 124 dl
124 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Authorship Attribution and Profiling · Face recognition and analysis