Sustainable self-supervised learning for speech representations

Luis Lugo; Valentin Vielzeuf

arXiv:2406.07696·cs.CL·June 13, 2024·1 cites

Sustainable self-supervised learning for speech representations

Luis Lugo, Valentin Vielzeuf

PDF

Open Access

TL;DR

This paper introduces a sustainable self-supervised speech representation model that significantly reduces computational costs and environmental impact while maintaining or improving performance on downstream tasks.

Contribution

It presents a novel, resource-efficient self-supervised learning approach for speech that reduces memory and energy consumption compared to existing large-scale models.

Findings

01

Reduces memory usage by an order of magnitude.

02

Achieves nearly three orders of magnitude reduction in computing costs.

03

Improves downstream task error rates over baseline models.

Abstract

Sustainable artificial intelligence focuses on data, hardware, and algorithms to make machine learning models more environmentally responsible. In particular, machine learning models for speech representations are computationally expensive, generating environmental concerns because of their high energy consumption. Thus, we propose a sustainable self-supervised model to learn speech representation, combining optimizations in neural layers and training to reduce computing costs. The proposed model improves over a resource-efficient baseline, reducing both memory usage and computing cost estimations. It pretrains using a single GPU in less than a day. On top of that, it improves the error rate performance of the baseline in downstream task evaluations. When comparing it to large speech representation approaches, there is an order of magnitude reduction in memory usage, while computing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems