DeepGalaxy: Testing Neural Network Verifiers via Two-Dimensional Input Space Exploration
Xuan Xie, Fuyuan Zhang

TL;DR
DeepGalaxy is an automated differential testing framework that explores the two-dimensional input space of neural network verifiers to identify bugs, significantly improving verification reliability in safety-critical applications.
Contribution
It introduces mutation-based exploration and heuristic test case selection for neural network verifiers, revealing unknown bugs in state-of-the-art tools.
Findings
Discovered five previously unknown bugs in neural network verifiers
Demonstrated efficiency and effectiveness of DeepGalaxy in testing verifiers
Validated the approach on three leading neural network verification tools
Abstract
Deep neural networks (DNNs) are widely developed and applied in many areas, and the quality assurance of DNNs is critical. Neural network verification (NNV) aims to provide formal guarantees to DNN models. Similar to traditional software, neural network verifiers could also contain bugs, which would have a critical and serious impact, especially in safety-critical areas. However, little work exists on validating neural network verifiers. In this work, we propose DeepGalaxy, an automated approach based on differential testing to tackle this problem. Specifically, we (1) propose a line of mutation rules, including model level mutation and specification level mutation, to effectively explore the two-dimensional input space of neural network verifiers; and (2) propose heuristic strategies to select test cases. We leveraged our implementation of DeepGalaxy to test three state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
