The TCG CREST -- RKMVERI Submission for the NCIIPC Startup India AI Grand Challenge

Nikhil Raghav; Arnab Banerjee; Janojit Chakraborty; Avisek Gupta; Swami Punyeshwarananda; Md Sahidullah

arXiv:2512.11009·cs.SD·December 15, 2025

The TCG CREST -- RKMVERI Submission for the NCIIPC Startup India AI Grand Challenge

Nikhil Raghav, Arnab Banerjee, Janojit Chakraborty, Avisek Gupta, Swami Punyeshwarananda, Md Sahidullah

PDF

Open Access

TL;DR

This paper presents a multilingual audio processing pipeline for speaker diarization, transcription, and translation, emphasizing robustness and real-world applicability in low-resource, multilingual, and code-mixed scenarios.

Contribution

It introduces a multi-kernel consensus spectral clustering framework and fine-tuned models that enhance speaker diarization and identification in challenging conditions.

Findings

01

Improved diarization performance across diverse recordings

02

Effective speaker and language identification in low-resource settings

03

Enhanced robustness through post-processing refinements

Abstract

In this report, we summarize the integrated multilingual audio processing pipeline developed by our team for the inaugural NCIIPC Startup India AI GRAND CHALLENGE, addressing Problem Statement 06: Language-Agnostic Speaker Identification and Diarisation, and subsequent Transcription and Translation System. Our primary focus was on advancing speaker diarization, a critical component for multilingual and code-mixed scenarios. The main intent of this work was to study the real-world applicability of our in-house speaker diarization (SD) systems. To this end, we investigated a robust voice activity detection (VAD) technique and fine-tuned speaker embedding models for improved speaker identification in low-resource settings. We leveraged our own recently proposed multi-kernel consensus spectral clustering framework, which substantially improved the diarization performance across all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing