Learning Program Behavioral Models from Synthesized Input-Output Pairs
Tural Mammadov, Dietrich Klakow, Alexander Koller, Andreas, Zeller

TL;DR
This paper presents Modelizer, a neural framework that learns reversible, differentiable models of black-box programs from input-output pairs, enabling accurate behavior prediction and synthesis for program understanding.
Contribution
It introduces a novel, grammar-based sequence-to-sequence learning approach that models program behavior without relying on code, focusing on input-output data and enabling reversibility.
Findings
Achieves up to 95.4% accuracy in mocking program behavior
Models require fewer than 6.3 million parameters for languages like Markdown and HTML
Demonstrates effective behavior prediction and input synthesis for program analysis
Abstract
We introduce Modelizer - a novel framework that, given a black-box program, learns a model from its input/output behavior using neural machine translation algorithms. The resulting model mocks the original program: Given an input, the model predicts the output that would have been produced by the program. However, the model is also reversible - that is, the model can predict the input that would have produced a given output. Finally, the model is differentiable and can be efficiently restricted to predict only a certain aspect of the program behavior. Modelizer uses grammars to synthesize and inputs and unsupervised tokenizers to decompose the resulting outputs, allowing it to learn sequence-to-sequence associations between token streams. Other than input grammars, Modelizer only requires the ability to execute the program. The resulting models are small, requiring fewer than 6.3…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming · Software Engineering Research · Evolutionary Algorithms and Applications
