Analysis of Failures and Risks in Deep Learning Model Converters: A Case Study in the ONNX Ecosystem
Purvish Jajal, Wenxin Jiang, Arav Tewari, Erik Kocinare, Joseph Woo,, Anusha Sarraf, Yung-Hsiang Lu, George K. Thiruvathukal, James C. Davis

TL;DR
This study investigates failure modes in deep learning model converters, especially in the ONNX ecosystem, revealing that most defects occur during node conversion and many lead to semantically incorrect models, highlighting areas for improvement.
Contribution
It provides a detailed failure analysis of DL model converters, especially ONNX, and formulates hypotheses about structural causes of these failures, guiding future research.
Findings
75% of defects occur during node conversion
33% of failures involve semantically incorrect models
Models with behavior inconsistencies share operator sequences
Abstract
Software engineers develop, fine-tune, and deploy deep learning (DL) models using a variety of development frameworks and runtime environments. DL model converters move models between frameworks and to runtime environments. Conversion errors compromise model quality and disrupt deployment. However, the failure characteristics of DL model converters are unknown, adding risk when using DL interoperability technologies. This paper analyzes failures in DL model converters. We survey software engineers about DL interoperability tools, use cases, and pain points (N=92). Then, we characterize failures in model converters associated with the main interoperability tool, ONNX (N=200 issues in PyTorch and TensorFlow). Finally, we formulate and test two hypotheses about structural causes for the failures we studied. We find that the node conversion stage of a model converter accounts for ~75% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Testing and Debugging Techniques · Software Engineering Research
Methodsfail
