TARNet: A Temporal-Aware Multi-Scale Architecture for Closed-Set Speaker Identification

Yassin Terraf; Youssef Iraqi

arXiv:2605.07735·cs.SD·May 11, 2026

TARNet: A Temporal-Aware Multi-Scale Architecture for Closed-Set Speaker Identification

Yassin Terraf, Youssef Iraqi

PDF

1 Repo

TL;DR

TARNet is a lightweight, multi-scale, temporal-aware neural architecture for closed-set speaker identification that models dependencies across various time scales to improve accuracy.

Contribution

It introduces a multi-stage temporal encoder with stage-specific dilation and an attentive pooling mechanism for enhanced speaker embedding extraction.

Findings

01

TARNet outperforms state-of-the-art methods on VoxCeleb1 and LibriSpeech datasets.

02

It maintains competitive computational complexity for practical deployment.

03

The code is publicly available at https://github.com/YassinTERRAF/TARNet.

Abstract

Closed-Set speaker identification aims to assign a speech utterance to one of a predefined set of enrolled speakers and requires robust modeling of speaker-specific characteristics across multiple temporal scales. While recent deep learning approaches have achieved strong performance, many existing architectures provide limited mechanisms for modeling temporal dependencies across different time scales, which can restrict the effective use of complementary short-, mid-, and long-term speaker characteristics. In this paper, we propose TARNet, a lightweight Temporal-Aware Representation Network for closed-set speaker identification. TARNet explicitly models temporal information at multiple time scales using a multi-stage temporal encoder with stage-specific dilation configurations. The resulting multi-scale representations are fused and aggregated via an Attentive Statistics Pooling (ASP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YassinTERRAF/TARNet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.