# On Residual CNN in text-dependent speaker verification task

**Authors:** Egor Malykh, Sergey Novoselov, Oleg Kudashev

arXiv: 1705.10134 · 2017-05-31

## TL;DR

This paper explores the use of deep residual CNNs with spectrogram inputs for text-dependent speaker verification, achieving promising results and significant improvements through system fusion.

## Contribution

It introduces a residual CNN approach to speaker verification and evaluates its performance, demonstrating potential despite not surpassing the baseline alone.

## Key findings

- Achieved 5.23% ERR on RSR2015 evaluation.
- Fusion of systems improved performance by 18%.
- Residual CNNs show promise for speaker verification.

## Abstract

Deep learning approaches are still not very common in the speaker verification field. We investigate the possibility of using deep residual convolutional neural network with spectrograms as an input features in the text-dependent speaker verification task. Despite the fact that we were not able to surpass the baseline system in quality, we achieved a quite good results for such a new approach getting an 5.23% ERR on the RSR2015 evaluation part. Fusion of the baseline and proposed systems outperformed the best individual system by 18% relatively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.10134/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1705.10134/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1705.10134/full.md

---
Source: https://tomesphere.com/paper/1705.10134