ESPnet-ONNX: Bridging a Gap Between Research and Production
Masao Someki, Yosuke Higuchi, Tomoki Hayashi, Shinji Watanabe

TL;DR
This paper presents ESPnet-ONNX, a set of techniques to optimize speech processing models for deployment, achieving significant speedups without retraining, thus bridging the gap between research models and production needs.
Contribution
The authors introduce a procedure for converting and optimizing ESPnet speech models into ONNX format, enabling faster inference suitable for production environments.
Findings
Achieved 1.3-2× speedup in speech tasks
Maintained model performance after optimization
Provided publicly available tools for deployment
Abstract
In the field of deep learning, researchers often focus on inventing novel neural network models and improving benchmarks. In contrast, application developers are interested in making models suitable for actual products, which involves optimizing a model for faster inference and adapting a model to various platforms (e.g., C++ and Python). In this work, to fill the gap between the two, we establish an effective procedure for optimizing a PyTorch-based research-oriented model for deployment, taking ESPnet, a widely used toolkit for speech processing, as an instance. We introduce different techniques to ESPnet, including converting a model into an ONNX format, fusing nodes in a graph, and quantizing parameters, which lead to approximately 1.3-2 speedup in various tasks (i.e., ASR, TTS, speech translation, and spoken language understanding) while keeping its performance without any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
