CRAQL: A Composable Language for Querying Source Code
Blake Johnson, Rahul Simha

TL;DR
CRAQL is a new, SQL-like query language designed for source code analysis that leverages abstract syntax trees to enable complex, composable queries across parsed code structures, improving clarity and efficiency.
Contribution
The paper introduces CRAQL, a novel query language for source code that supports composable queries on ASTs, enhancing expressiveness and applicability across programming languages.
Findings
CRAQL can query complex code structures with high clarity.
CRAQL demonstrates efficient analysis on millions of Java files.
The language offers advantages over existing code querying tools.
Abstract
This paper describes the design and implementation of CRAQL (Composable Repository Analysis and Query Language), a new query language for source code. The growth of source code mining and its applications suggest the need for a query language that can fully utilize and correlate across the unique structure and metadata of parsed source code. CRAQL is built on an underlying abstraction analogous to the underpinnings of SQL, but aimed at parsed source code. Thus, while SQL queries' inputs and outputs are sets of tuples, CRAQL queries' inputs and outputs are sets of abstract syntax trees (ASTs). This abstraction makes CRAQL queries composable (the output of one query can become the input to another) and improves the power of the language by allowing for querying of the tree structure and metadata, as well as raw text. Furthermore, the abstraction enables tree-specific language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Software Engineering Research · Web Data Mining and Analysis
