When Prompt Engineering Meets Software Engineering: CNL-P as Natural and Robust "APIs'' for Human-AI Interaction
Zhenchang Xing, Yang Liu, Zhuo Cheng, Qing Huang, Dehai Zhao, Daniel Sun, Chenhua Liu

TL;DR
This paper introduces CNL-P, a structured natural language prompt format inspired by software engineering principles, which improves LLM response quality and interpretability through precise grammar, semantic norms, and supporting tools.
Contribution
It presents CNL-P, a novel controlled natural language prompt framework that integrates prompt engineering best practices with software engineering principles, including tools for conversion and validation.
Findings
CNL-P improves LLM response consistency and quality.
The NL2CNL-P conversion tool lowers the barrier to adopting CNL-P.
Static analysis enhances prompt syntactic and semantic accuracy.
Abstract
With the growing capabilities of large language models (LLMs), they are increasingly applied in areas like intelligent customer service, code generation, and knowledge management. Natural language (NL) prompts act as the ``APIs'' for human-LLM interaction. To improve prompt quality, best practices for prompt engineering (PE) have been developed, including writing guidelines and templates. Building on this, we propose Controlled NL for Prompt (CNL-P), which not only incorporates PE best practices but also draws on key principles from software engineering (SE). CNL-P introduces precise grammar structures and strict semantic norms, further eliminating NL's ambiguity, allowing for a declarative but structured and accurate expression of user intent. This helps LLMs better interpret and execute the prompts, leading to more consistent and higher-quality outputs. We also introduce an NL2CNL-P…
Peer Reviews
Decision·ICLR 2025 Poster
- Clearly motivated framing of prompting as an API, enabling users to leverage AI model capabilities without extensive technical expertise. - Strong connection to first principles in SE, providing a foundation to address challenges in complex NL-PL conversion and prompt-code coupling. This approach is particularly beneficial for language experts and non-technical users by effectively decoupling prompts from code. - Dimensions to assess NL-to-CNL-P conversion quality are well-designed, covering d
- The specific aims of the work remain unclear; while high-level challenges and design considerations are presented, the precise goals are hard to identify. - Experiment setup in RQ1 lacks clarity on how the five dimensions are measured and how the 93 prompt instances were chosen. There are also no human validation results presented, even as partial samples. - RQ1 primarily assesses design considerations, while RQ2 focuses on accuracy. Given the current setup and task scope in RQ2, the advantage
Theoretical Innovation: - Novel synthesis of SE principles with PE practices - Comprehensive formal grammar for controlled natural language - Innovative application of static analysis theory to natural language Technical Contribution - Formal specification of the CNL-P grammar - Theoretical framework for prompt verification - Rigorous performance analysis across LLM architectures - Novel approach to static analysis of natural language Research Impact: - Opens new theoretical directions in prom
Theoretical Limitations: - Formal analysis of expressive power could be stronger. - Completeness properties of the static analysis need more discussion. - Edge cases in the formal grammar require deeper analysis. - Theoretical bounds need more rigorous treatment. Methodological Concerns: - Formal comparison with other structured approaches could be deeper. - Statistical analysis could be more comprehensive. - Theoretical justification for design choices needs elaboration. - Formal properties of
1. The paper effectively combines prompt engineering and software engineering to introduce CNL-P as a structured, precise language for prompt design. 2. CNL-P’s modular design enables independent development, testing, and maintenance. 3. Its linting tool supports syntactic and semantic checks, which enables static analysis techniques for natural language.
I have several concerns regarding the evaluation section: 1. For RQ1, the authors asked ChatGPT-4o to assess the quality of conversions from natural language prompts to CNL-P or NL style guides based on five criteria. However, the reliability of this evaluation is not properly validated: - The authors did not provide evidence of how the evaluation results correlate with actual human evaluations, which would strengthen their claims. - There is no guideline detailing how the scale for eac
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Software Engineering Research · Topic Modeling
