TACNET: Temporal Audio Source Counting Network

Amirreza Ahmadnejad; Ahmad Mahmmodian Darviishani; Mohmmad Mehrdad; Asadi; Sajjad Saffariyeh; Pedram Yousef; Emad Fatemizadeh

arXiv:2311.02369·cs.SD·December 23, 2024·1 cites

TACNET: Temporal Audio Source Counting Network

Amirreza Ahmadnejad, Ahmad Mahmmodian Darviishani, Mohmmad Mehrdad, Asadi, Sajjad Saffariyeh, Pedram Yousef, Emad Fatemizadeh

PDF

Open Access

TL;DR

TaCNet is a novel deep learning architecture that directly processes raw audio for real-time speaker counting, achieving state-of-the-art accuracy across multiple languages and diverse scenarios.

Contribution

It introduces TaCNet, a new architecture that simplifies audio source counting by operating on raw audio and excels in real-time, multilingual applications.

Findings

01

Average accuracy of 74.18% over 11 classes

02

Effective in real-time speaker counting

03

Demonstrates cross-lingual adaptability

Abstract

In this paper, we introduce the Temporal Audio Source Counting Network (TaCNet), an innovative architecture that addresses limitations in audio source counting tasks. TaCNet operates directly on raw audio inputs, eliminating complex preprocessing steps and simplifying the workflow. Notably, it excels in real-time speaker counting, even with truncated input windows. Our extensive evaluation, conducted using the LibriCount dataset, underscores TaCNet's exceptional performance, positioning it as a state-of-the-art solution for audio source counting tasks. With an average accuracy of 74.18 percentage over 11 classes, TaCNet demonstrates its effectiveness across diverse scenarios, including applications involving Chinese and Persian languages. This cross-lingual adaptability highlights its versatility and potential impact.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis