Using Python for Model Inference in Deep Learning
Zachary DeVito, Jason Ansel, Will Constable, Michael Suo, Ailing, Zhang, Kim Hazelwood

TL;DR
This paper introduces a method to perform scalable deep learning inference directly in Python by using multiple interpreters and a new container format, eliminating the need for model extraction and improving deployment flexibility.
Contribution
It presents a novel approach combining multiple Python interpreters and a new model container format to simplify deployment and enhance inference scalability without model extraction.
Findings
Performance comparable to TorchScript for large models
Scalable inference for smaller models with Python overhead
Simplified deployment process without model extraction
Abstract
Python has become the de-facto language for training deep neural networks, coupling a large suite of scientific computing libraries with efficient libraries for tensor computation such as PyTorch or TensorFlow. However, when models are used for inference they are typically extracted from Python as TensorFlow graphs or TorchScript programs in order to meet performance and packaging constraints. The extraction process can be time consuming, impeding fast prototyping. We show how it is possible to meet these performance and packaging constraints while performing inference in Python. In particular, we present a way of using multiple Python interpreters within a single process to achieve scalable inference and describe a new container format for models that contains both native Python code and data. This approach simplifies the model deployment story by eliminating the model extraction step,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Parallel Computing and Optimization Techniques · Topic Modeling
