# Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

**Authors:** Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen, Takafumi Koshinaka

arXiv: 1906.08556 · 2019-06-21

## TL;DR

This paper demonstrates how GPU acceleration dramatically speeds up i-vector extraction, enabling new research avenues and improvements in speaker verification accuracy.

## Contribution

It introduces GPU-based acceleration for i-vector extraction, allowing faster training and exploration of new model variations, and reveals undocumented details of Kaldi's implementation.

## Key findings

- 3000x faster frame posterior computation
- 25x faster i-vector training compared to CPU
- Outperforms standard Kaldi i-vector extractor by 1-2% on VoxCeleb

## Abstract

Speaker embeddings are continuous-value vector representations that allow easy comparison between voices of speakers with simple geometric operations. Among others, i-vector and x-vector have emerged as the mainstream methods for speaker embedding. In this paper, we illustrate the use of modern computation platform to harness the benefit of GPU acceleration for i-vector extraction. In particular, we achieve an acceleration of 3000 times in frame posterior computation compared to real time and 25 times in training the i-vector extractor compared to the CPU baseline from Kaldi toolkit. This significant speed-up allows the exploration of ideas that were hitherto impossible. In particular, we show that it is beneficial to update the universal background model (UBM) and re-compute frame alignments while training the i-vector extractor. Additionally, we are able to study different variations of i-vector extractors more rigorously than before. In this process, we reveal some undocumented details of Kaldi's i-vector extractor and show that it outperforms the standard formulation by a margin of 1 to 2% when tested with VoxCeleb speaker verification protocol. All of our findings are asserted by ensemble averaging the results from multiple runs with random start.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08556/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08556/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1906.08556/full.md

---
Source: https://tomesphere.com/paper/1906.08556