FABLE: A Novel Data-Flow Analysis Benchmark on Procedural Text for Large Language Model Evaluation
Vishal Pallagani, Nitin Gupta, John Aydin, Biplav Srivastava

TL;DR
FABLE is a comprehensive benchmark that evaluates large language models' ability to understand data flow in procedural texts across multiple domains, revealing current limitations and guiding future improvements.
Contribution
Introduces FABLE, the first systematic benchmark for assessing LLMs' data-flow reasoning in procedural contexts across diverse real-world domains.
Findings
Reasoning-focused models outperform general-purpose models in accuracy.
Current LLMs perform near random chance on data-flow reasoning tasks.
FABLE enables diagnostic evaluation of procedural understanding in LLMs.
Abstract
Understanding how data moves, transforms, and persists, known as data flow, is fundamental to reasoning in procedural tasks. Despite their fluency in natural and programming languages, large language models (LLMs), although increasingly being applied to decisions with procedural tasks, have not been systematically evaluated for their ability to perform data-flow reasoning. We introduce FABLE, an extensible benchmark designed to assess LLMs' understanding of data flow using structured, procedural text. FABLE adapts eight classical data-flow analyses from software engineering: reaching definitions, very busy expressions, available expressions, live variable analysis, interval analysis, type-state analysis, taint analysis, and concurrency analysis. These analyses are instantiated across three real-world domains: cooking recipes, travel routes, and automated plans. The benchmark includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsEmirates Airlines Office in Dubai
