Performance and Complexity Trade-off Optimization of Speech Models During Training
Esteban G\'omez, Tom B\"ackstr\"om

TL;DR
This paper introduces a novel reparameterization method that allows simultaneous optimization of speech model performance and computational complexity during training, avoiding heuristic pruning.
Contribution
It proposes a feature noise injection technique enabling joint training for performance and complexity trade-offs, unlike traditional post hoc pruning methods.
Findings
Effective in voice activity detection
Improves audio anti-spoofing models
Dynamically balances model size and accuracy
Abstract
In speech machine learning, neural network models are typically designed by choosing an architecture with fixed layer sizes and structure. These models are then trained to maximize performance on metrics aligned with the task's objective. While the overall architecture is usually guided by prior knowledge of the task, the sizes of individual layers are often chosen heuristically. However, this approach does not guarantee an optimal trade-off between performance and computational complexity; consequently, post hoc methods such as weight quantization or model pruning are typically employed to reduce computational cost. This occurs because stochastic gradient descent (SGD) methods can only optimize differentiable functions, while factors influencing computational complexity, such as layer sizes and floating-point operations per second (FLOP/s), are non-differentiable and require modifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Stochastic Gradient Optimization Techniques
