An Exploratory Study of Ad Hoc Parsers in Python
Michael Schr\"oder, Marc Goritschnig, J\"urgen Cito

TL;DR
This study investigates the characteristics of ad hoc parsers in Python, which are simple string-processing code snippets, to better understand their structure and inform future program analysis techniques.
Contribution
It provides a large-scale empirical analysis of ad hoc parsing code in Python, revealing common patterns and characteristics to guide future research.
Findings
Identification of common syntactic patterns in ad hoc parsers
Discovery of semantic characteristics of parsing code
Clustering of parsing patterns based on analyzed metrics
Abstract
Background: Ad hoc parsers are pieces of code that use common string functions like split, trim, or slice to effectively perform parsing. Whether it is handling command-line arguments, reading configuration files, parsing custom file formats, or any number of other minor string processing tasks, ad hoc parsing is ubiquitous -- yet poorly understood. Objective: This study aims to reveal the common syntactic and semantic characteristics of ad hoc parsing code in real world Python projects. Our goal is to understand the nature of ad hoc parsers in order to inform future program analysis efforts in this area. Method: We plan to conduct an exploratory study based on large-scale mining of open-source Python repositories from GitHub. We will use program slicing to identify program fragments related to ad hoc parsing and analyze these parsers and their surrounding contexts across 9 research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Computational Physics and Python Applications · Software Testing and Debugging Techniques
