PRISM: Prompt Reliability via Iterative Simulation and Monitoring for Enterprise Conversational AI
Keshava Chaitanya, Jahnavi Gundakaram

TL;DR
PRISM is a framework that continuously tests, diagnoses, and repairs prompts for enterprise conversational AI, ensuring high reliability despite LLM behavioral drift over time.
Contribution
PRISM introduces a closed-loop, iterative approach to prompt engineering that automates testing, diagnosis, and repair for maintaining prompt reliability in production environments.
Findings
Reduces prompt authoring time from 2 days to under 30 minutes
Achieves 99% reliability across enterprise agents
Detects and repairs regressions within 24 hours
Abstract
Deploying large language model (LLM)-driven conversational agents in enterprise settings requires prompts that are simultaneously correct at launch and resilient to the non-deterministic behavioral drift that characterizes production LLM deployments. Existing prompt optimization frameworks address prompt quality as a one-time compile-time problem, leaving open the equally critical question of how to detect and repair prompt regressions caused by silent LLM behavior changes over time. We present PRISM (Prompt Reliability via Iterative Simulation and Monitoring), a closed-loop framework that treats prompt engineering as a continuous reliability engineering problem rather than a one-time authorship task. PRISM takes as input plain-language agent requirements, a set of configured tools and memory variables, and an initial draft prompt. It automatically generates test cases from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
