ESPnet-ONNX: Bridging a Gap Between Research and Production

Masao Someki; Yosuke Higuchi; Tomoki Hayashi; Shinji Watanabe

arXiv:2209.09756·eess.AS·November 15, 2022

ESPnet-ONNX: Bridging a Gap Between Research and Production

Masao Someki, Yosuke Higuchi, Tomoki Hayashi, Shinji Watanabe

PDF

Open Access 1 Repo

TL;DR

This paper presents ESPnet-ONNX, a set of techniques to optimize speech processing models for deployment, achieving significant speedups without retraining, thus bridging the gap between research models and production needs.

Contribution

The authors introduce a procedure for converting and optimizing ESPnet speech models into ONNX format, enabling faster inference suitable for production environments.

Findings

01

Achieved 1.3-2× speedup in speech tasks

02

Maintained model performance after optimization

03

Provided publicly available tools for deployment

Abstract

In the field of deep learning, researchers often focus on inventing novel neural network models and improving benchmarks. In contrast, application developers are interested in making models suitable for actual products, which involves optimizing a model for faster inference and adapting a model to various platforms (e.g., C++ and Python). In this work, to fill the gap between the two, we establish an effective procedure for optimizing a PyTorch-based research-oriented model for deployment, taking ESPnet, a widely used toolkit for speech processing, as an instance. We introduce different techniques to ESPnet, including converting a model into an ONNX format, fusing nodes in a graph, and quantizing parameters, which lead to approximately 1.3-2 $\times$ speedup in various tasks (i.e., ASR, TTS, speech translation, and spoken language understanding) while keeping its performance without any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

espnet/espnet_onnx
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques