Prompting or Fine-tuning? Exploring Large Language Models for Causal Graph Validation
Yuni Susanti, Nina Holsmoelle

TL;DR
This paper investigates the use of Large Language Models to evaluate causal graphs, comparing prompting and fine-tuning methods, and finds that fine-tuned models outperform prompting approaches in accuracy.
Contribution
It introduces a systematic comparison of prompting versus fine-tuning LLMs for causal relation evaluation, highlighting the superior performance of fine-tuned models.
Findings
Fine-tuned models outperform prompting methods by up to 20.5 F1 points.
Fine-tuning yields better causal inference accuracy even with smaller models.
LLMs can effectively evaluate causality in biomedical and general domains.
Abstract
This study explores the capability of Large Language Models (LLMs) to evaluate causality in causal graphs generated by conventional statistical causal discovery methods-a task traditionally reliant on manual assessment by human subject matter experts. To bridge this gap in causality assessment, LLMs are employed to evaluate the causal relationships by determining whether a causal connection between variable pairs can be inferred from textual context. Our study compares two approaches: (1) prompting-based method for zero-shot and few-shot causal inference and, (2) fine-tuning language models for the causal relation prediction task. While prompt-based LLMs have demonstrated versatility across various NLP tasks, our experiments on biomedical and general-domain datasets show that fine-tuned models consistently outperform them, achieving up to a 20.5-point improvement in F1 score-even when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · Semantic Web and Ontologies
