Towards Speaker Identification with Minimal Dataset and Constrained   Resources using 1D-Convolution Neural Network

Irfan Nafiz Shahan; Pulok Ahmed Auvi

arXiv:2411.15082·cs.SD·November 25, 2024

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Irfan Nafiz Shahan, Pulok Ahmed Auvi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a lightweight 1D-CNN model for speaker identification that performs well on minimal datasets, achieving nearly 98% accuracy, and emphasizes reproducibility with open-source resources.

Contribution

The paper presents a novel minimal-data 1D-CNN architecture for speaker ID, utilizing data augmentation and providing open-source code and datasets.

Findings

01

Validation accuracy of 97.87% achieved.

02

Effective handling of background noise with data augmentation.

03

Open-source code and datasets provided for reproducibility.

Abstract

Voice recognition and speaker identification are vital for applications in security and personal assistants. This paper presents a lightweight 1D-Convolutional Neural Network (1D-CNN) designed to perform speaker identification on minimal datasets. Our approach achieves a validation accuracy of 97.87%, leveraging data augmentation techniques to handle background noise and limited training samples. Future improvements include testing on larger datasets and integrating transfer learning methods to enhance generalizability. We provide all code, the custom dataset, and the trained models to facilitate reproducibility. These resources are available on our GitHub repository: https://github.com/IrfanNafiz/RecMe.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

irfannafiz/recme
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing