# Spatial Pyramid Encoding with Convex Length Normalization for   Text-Independent Speaker Verification

**Authors:** Youngmoon Jung, Younggwan Kim, Hyungjun Lim, Yeunju Choi, Hoirin Kim

arXiv: 1906.08333 · 2019-12-30

## TL;DR

This paper introduces a novel spatial pyramid encoding pooling method combined with deep length normalization to improve speaker embeddings for text-independent speaker verification, demonstrating superior performance on VoxCeleb1.

## Contribution

The paper proposes a new pooling technique called spatial pyramid encoding with convex length normalization, enhancing speaker verification accuracy.

## Key findings

- Outperforms i-vector and d-vector baselines on VoxCeleb1
- Generates fixed-dimensional embeddings from variable-length speech
- Effectively normalizes embeddings using ring loss

## Abstract

In this paper, we propose a new pooling method called spatial pyramid encoding (SPE) to generate speaker embeddings for text-independent speaker verification. We first partition the output feature maps from a deep residual network (ResNet) into increasingly fine sub-regions and extract speaker embeddings from each sub-region through a learnable dictionary encoding layer. These embeddings are concatenated to obtain the final speaker representation. The SPE layer not only generates a fixed-dimensional speaker embedding for a variable-length speech segment, but also aggregates the information of feature distribution from multi-level temporal bins. Furthermore, we apply deep length normalization by augmenting the loss function with ring loss. By applying ring loss, the network gradually learns to normalize the speaker embeddings using model weights themselves while preserving convexity, leading to more robust speaker embeddings. Experiments on the VoxCeleb1 dataset show that the proposed system using the SPE layer and ring loss-based deep length normalization outperforms both i-vector and d-vector baselines.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08333/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08333/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/1906.08333/full.md

---
Source: https://tomesphere.com/paper/1906.08333