Generating Inputs for Grammar Mining using Dynamic Symbolic Execution
Andreas Pointner (University of Applied Sciences Upper Austria, Austria), Josef Pichler (University of Applied Sciences Upper Austria, Austria), Herbert Pr\"ahofer (Johannes Kepler University Linz, Austria)

TL;DR
This paper introduces a novel automated method combining Dynamic Symbolic Execution with grammar mining to generate comprehensive inputs, addressing limitations of existing approaches in capturing complete input languages for software components.
Contribution
It presents a fully automated input generation approach that enhances grammar mining by overcoming DSE limitations through iterative expansion and structured input generation.
Findings
Improves completeness of grammar extraction from limited input data
Successfully integrates DSE with grammar mining techniques
Enhances detection of edge cases and previously unsupported features
Abstract
A vast number of software systems include components that parse and process structured input. In addition to programming languages, which are analyzed by compilers or interpreters, there are numerous components that process standardized or proprietary data formats of varying complexity. Even if such components were initially developed and tested based on a specification, such as a grammar, numerous modifications and adaptations over the course of software evolution can make it impossible to precisely determine which inputs they actually accept. In this situation, grammar mining can be used to reconstruct the specification in the form of a grammar. Established approaches already produce useful results, provided that sufficient input data is available to fully cover the input language. However, achieving this completeness is a major challenge. In practice, only input data recorded…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
