What Programs Want: Automatic Inference of Input Data Specifications
Caterina Urban

TL;DR
This paper introduces a static analysis framework that automatically infers data input specifications for data-processing programs, helping to identify potential errors in data handling and ensuring data conforms to expected structures and values.
Contribution
It presents a novel static shape analysis method for inferring input data requirements, extending abstract domains to reason about data inputs in Python programs.
Findings
Successfully infers data input constraints in examples
Detects potential data-related errors automatically
Implemented as an open-source Python static analyzer
Abstract
Nowadays, as machine-learned software quickly permeates our society, we are becoming increasingly vulnerable to programming errors in the data pre-processing or training software, as well as errors in the data itself. In this paper, we propose a static shape analysis framework for input data of data-processing programs. Our analysis automatically infers necessary conditions on the structure and values of the data read by a data-processing program. Our framework builds on a family of underlying abstract domains, extended to indirectly reason about the input data rather than simply reasoning about the program variables. The choice of these abstract domain is a parameter of the analysis. We describe various instances built from existing abstract domains. The proposed approach is implemented in an open-source static analyzer for Python programs. We demonstrate its potential on a number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Software Testing and Debugging Techniques
