openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas, Stolcke, Chul Lee

TL;DR
This paper introduces openFEAT, a transformer-based framework that adapts speaker embeddings for household-specific identification, significantly improving accuracy in challenging few-shot, open-set scenarios.
Contribution
The paper proposes a novel embedding adaptation method using transformers for open-set few-shot speaker identification in households, addressing limitations of universal embeddings.
Findings
Reduced speaker identification error rate by 23-31% in simulated households.
Effective adaptation of speaker embeddings improves discrimination among similar voices.
Demonstrates the importance of household-specific embedding spaces for accurate speaker ID.
Abstract
Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics. A common embedding space learned from a large number of speakers is not universally applicable for the optimal identification of every speaker in a household. In this work, we first formulate household speaker identification as a few-shot open-set recognition task and then propose a novel embedding adaptation framework to adapt speaker representations from the given universal embedding space to a household-specific embedding space using a set-to-set function, yielding better household speaker identification performance. With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Residual Connection · Layer Normalization · Label Smoothing · Dropout · Dense Connections
