Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code

Zihao Xu; Xiao Cheng; Ruijie Meng; Yuekang Li

arXiv:2603.28345·cs.SE·March 31, 2026

Crossing the NL/PL Divide: Information Flow Analysis Across the NL/PL Boundary in LLM-Integrated Code

Zihao Xu, Xiao Cheng, Ruijie Meng, Yuekang Li

PDF

TL;DR

This paper introduces a novel information flow analysis method to bridge the natural language/programming language boundary created by LLM API calls, enabling better data tracking across this divide.

Contribution

It presents the first taxonomy-based approach grounded in quantitative information flow theory to classify and analyze LLM-induced NL/PL boundaries in code.

Findings

01

Validated taxonomy with high reliability (Cohen's κ=0.82) on real-world Python code.

02

Achieved 92.3% F1 score in taint propagation task using taxonomy-based filtering.

03

Reduced backward slice size by 15% using taxonomy-informed analysis.

Abstract

LLM API calls are becoming a ubiquitous program construct, yet they create a boundary that no existing program analysis can cross: runtime values enter a natural-language prompt, undergo opaque processing inside the LLM, and re-emerge as code, SQL, JSON, or text that the program consumes. Every analysis that tracks data across function boundaries, including taint analysis, program slicing, dependency analysis, and change-impact analysis, relies on dataflow summaries of callee behavior. LLM calls have no such summaries, breaking all of these analyses at what we call the NL/PL boundary. We present the first information flow method to bridge this boundary. Grounded in quantitative information flow theory, our taxonomy defines 24 labels along two orthogonal dimensions: information preservation level (from lexically preserved to fully blocked) and output modality (natural language,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.