PrismaDV: Automated Task-Aware Data Unit Test Generation
Hao Chen, Arnab Phani, Sebastian Schelter

TL;DR
PrismaDV is an AI system that generates task-aware data unit tests by analyzing downstream code and data, improving data validation accuracy for enterprise datasets.
Contribution
It introduces PrismaDV, which combines code and data analysis with prompt optimization to create adaptive, task-aware data validation tests.
Findings
PrismaDV outperforms baseline methods in generating relevant data unit tests.
SIFTA improves prompt quality, leading to better test generation.
The system is evaluated on 60 tasks across five datasets, showing consistent improvements.
Abstract
Data is a central resource for modern enterprises, and data validation is essential for ensuring the reliability of downstream applications. However, existing automated data unit testing frameworks are largely task-agnostic: they validate datasets without considering the semantics and requirements of the code that consumes the data. We present PrismaDV, a compound AI system that analyzes downstream task code together with dataset profiles to identify data access patterns, infer implicit data assumptions, and generate task-aware executable data unit tests. To further adapt the data unit tests over time to specific datasets and downstream tasks, we propose "Selective Informative Feedback for Task Adaptation" (SIFTA), a prompt-optimization framework that leverages the scarce outcomes from the execution of data unit tests and downstream tasks. We evaluate PrismaDV on two new benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
