Toward Low-Cost End-to-End Spoken Language Understanding
Marco Dinarelli, Marco Naguib, Fran\c{c}ois Portet

TL;DR
This paper explores methods to reduce computational and energy costs in end-to-end spoken language understanding models while maintaining high performance, using self-supervised learning on French speech corpora.
Contribution
It introduces strategies for lowering training costs and provides an extensive analysis of energy and time consumption in SSL-based spoken language understanding models.
Findings
Cost reduction is achievable without sacrificing state-of-the-art accuracy.
Energy consumption and training time can be significantly decreased.
The proposed methods perform well on FSC and MEDIA datasets.
Abstract
Recent advances in spoken language understanding benefited from Self-Supervised models trained on large speech corpora. For French, the LeBenchmark project has made such models available and has led to impressive progress on several tasks including spoken language understanding. These advances have a non-negligible cost in terms of computation time and energy consumption. In this paper, we compare several learning strategies trying to reduce such cost while keeping competitive performance. At the same time we propose an extensive analysis where we measure the cost of our models in terms of training time and electric energy consumption, hopefully promoting a comprehensive evaluation procedure. The experiments are performed on the FSC and MEDIA corpora, and show that it is possible to reduce the learning cost while maintaining state-of-the-art performance and using SSL models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
