Textless NLP -- Zero Resource Challenge with Low Resource Compute
Krithiga Ramadass, Abrit Pal Singh, Srihari J, Sheetal Kalyani

TL;DR
This paper introduces a low-resource, efficient approach for Textless NLP that reduces training time and compute requirements while improving audio quality, with successful application to multiple languages including Tamil and Bengali.
Contribution
It proposes a novel combination of optimized hop length, tuned interpolation, and cyclic learning rate scheduling for faster training and better performance in Textless NLP tasks.
Findings
Reduced training steps with improved performance.
Effective acoustic unit discovery for Indian languages.
Consistently good voice conversion results across datasets.
Abstract
This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for better audio quality. Additionally, we explore the latent space representation for Indian languages such as Tamil and Bengali for the acoustic unit discovery and voice conversion task. Our approach leverages a quantized encoder architecture, in conjunction with a vocoder which utilizes the proposed mixture of optimized hop length, tuned interpolation scale factors and a cyclic learning rate scheduler. We obtain consistently good results across English, Tamil and Bengali datasets. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies
