G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification

Vishwas M. Shetty; Jiusi Zheng; Abeer Alwan

arXiv:2508.07836·eess.AS·August 12, 2025

G-IFT: A Gated Linear Unit adapter with Iterative Fine-Tuning for Low-Resource Children's Speaker Verification

Vishwas M. Shetty, Jiusi Zheng, Abeer Alwan

PDF

Open Access

TL;DR

This paper introduces G-IFT, a novel iterative fine-tuning framework with a Gated Linear Unit adapter that significantly improves children's speaker verification performance across various architectures.

Contribution

The paper presents a new G-IFT framework that enhances knowledge transfer from adult to children's speech in speaker verification, effective across multiple architectures.

Findings

01

Consistent EER reduction across architectures

02

Effective knowledge transfer from adult to children's speech

03

Framework is architecture-agnostic

Abstract

Speaker Verification (SV) systems trained on adults speech often underperform on children's SV due to the acoustic mismatch, and limited children speech data makes fine-tuning not very effective. In this paper, we propose an innovative framework, a Gated Linear Unit adapter with Iterative Fine-Tuning (G-IFT), to enhance knowledge transfer efficiency between the high-resource adults speech domain and the low-resource children's speech domain. In this framework, a Gated Linear Unit adapter is first inserted between the pre-trained speaker embedding model and the classifier. Then the classifier, adapter, and pre-trained speaker embedding model are optimized sequentially in an iterative way. This framework is agnostic to the type of the underlying architecture of the SV system. Our experiments on ECAPA-TDNN, ResNet, and X-vector architectures using the OGI and MyST datasets demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders