TL;DR
This paper explores adapting SincNet-based raw waveform acoustic models from adult to children's speech, demonstrating efficient adaptation with minimal parameters and comparable error rates to more complex methods.
Contribution
It shows that SincNet's parameterization is highly suitable for efficient domain adaptation in speech recognition tasks.
Findings
Effective adaptation with few parameters
Comparable error rates to larger models
SincNet's parameterization facilitates practical adaptation
Abstract
Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better representations for a given scenario than hand-crafted features. SincNet has been proposed to reduce the number of parameters required in raw-waveform modelling, by restricting the filter functions, rather than having to learn every tap of each filter. We study the adaptation of the SincNet filter parameters from adults' to children's speech, and show that the parameterisation of the SincNet layer is well suited for adaptation in practice: we can efficiently adapt with a very small number of parameters, producing error rates comparable to techniques using orders of magnitude more parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
