CriticAL: Critic Automation with Language Models
Michael Y. Li, Vivek Vajipey, Noah D. Goodman, Emily B. Fox

TL;DR
CriticAL leverages large language models to automate scientific model criticism by generating and evaluating discrepancies between models and data, improving model validation and development.
Contribution
This paper introduces CriticAL, a novel framework that automates model criticism using LLMs within a hypothesis testing approach, addressing hallucination issues and enhancing scientific discovery.
Findings
CriticAL reliably generates accurate critiques without hallucinations.
CriticAL's critiques are preferred for transparency and actionability.
CriticAL enables LLM scientists to improve models on real datasets.
Abstract
Understanding the world through models is a fundamental goal of scientific research. While large language model (LLM) based approaches show promise in automating scientific discovery, they often overlook the importance of criticizing scientific models. Criticizing models deepens scientific understanding and drives the development of more accurate models. Automating model criticism is difficult because it traditionally requires a human expert to define how to compare a model with data and evaluate if the discrepancies are significant--both rely heavily on understanding the modeling assumptions and domain. Although LLM-based critic approaches are appealing, they introduce new challenges: LLMs might hallucinate the critiques themselves. Motivated by this, we introduce CriticAL (Critic Automation with Language Models). CriticAL uses LLMs to generate summary statistics that capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies
