Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

Jim O'Regan; Jens Edlund

arXiv:2605.02804·eess.AS·May 11, 2026

Multi-Axis Speech Similarity via Factor-Partitioned Embeddings

Jim O'Regan, Jens Edlund

PDF

1 Repo

TL;DR

This paper introduces a factor-partitioned embedding framework for speech that separates multiple attributes into distinct subspaces, enabling attribute-conditioned retrieval and suppression of biases.

Contribution

It proposes a novel multi-axis embedding method that disentangles speech attributes into subspaces, improving retrieval and bias control over conventional single-vector embeddings.

Findings

01

Embeddings support attribute-conditioned retrieval with attribute suppression.

02

Signed axis weighting reduces same-speaker bias in cross-corpus retrieval.

03

Code implementation is publicly available at the provided GitHub URL.

Abstract

Speech encodes multiple simultaneous attributes -- linguistic content, speaker identity, dialect, gender --that conventional single-vector embeddings conflate. We present a factor-partitioned embedding framework that maps each utterance into a single vector whose subspaces correspond to distinct axes of variation. A shared acoustic encoder feeds per-axis linear projection heads, each trained via distillation from a specialist teacher or a contrastive objective over shared-label pairs. The resulting embeddings support attribute-conditioned retrieval: similarity is computed as a signed weighted sum over per-axis cosine scores, allowing retrieval that jointly considers what was said and how -- or explicitly suppresses one attribute to surface another. We evaluate on cross-corpus retrieval over corpora sharing the Harvard sentence prompts, demonstrating that signed axis weighting can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jimregan/spoken-sentence-transformers
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.