Oracle-Checker Scheme for Evaluating a Generative Large Language Model
Yueling Jenny Zeng, Li-C. Wang, Thomas Ibbetson

TL;DR
This paper introduces an oracle-checker scheme for evaluating large language models, utilizing property testing and program checking in tasks like entity extraction and paraphrase detection.
Contribution
It proposes a novel evaluation framework combining property testing and program checking for assessing LLM outputs.
Findings
Effective in entity extraction evaluation
Applicable to paraphrase decision tasks
Demonstrates versatility across different contexts
Abstract
This work presents a novel approach called oracle-checker scheme for evaluating the answer given by a generative large language model (LLM). Two types of checkers are presented. The first type of checker follows the idea of property testing. The second type of checker follows the idea of program checking. Their applications are demonstrated in two separate contexts, entity extraction and paraphrase decision, respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
