Large language models for generating rules, yay or nay?
Shangeetha Sivasothy, Scott Barnett, Rena Logothetis, Mohamed, Abdelrazek, Zafaryab Rasool, Srikanth Thudumu, Zac Brannelly

TL;DR
This paper explores using large language models to generate logic rules for safety-critical systems, demonstrating their potential to bootstrap implementation but highlighting limitations in rule completeness and threshold generation.
Contribution
It introduces a novel approach leveraging LLMs as world models for rule generation in safety-critical domains, validated with a medical use case during COVID-19.
Findings
LLMs can bootstrap implementation by generating logic rules.
LLMs produce fewer rules than domain experts.
LLMs cannot generate thresholds for rules.
Abstract
Engineering safety-critical systems such as medical devices and digital health intervention systems is complex, where long-term engagement with subject-matter experts (SMEs) is needed to capture the systems' expected behaviour. In this paper, we present a novel approach that leverages Large Language Models (LLMs), such as GPT-3.5 and GPT-4, as a potential world model to accelerate the engineering of software systems. This approach involves using LLMs to generate logic rules, which can then be reviewed and informed by SMEs before deployment. We evaluate our approach using a medical rule set, created from the pandemic intervention monitoring system in collaboration with medical professionals during COVID-19. Our experiments show that 1) LLMs have a world model that bootstraps implementation, 2) LLMs generated less number of rules compared to experts, and 3) LLMs do not have the capacity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
