SylloBio-NLI: Evaluating Large Language Models on Biomedical Syllogistic Reasoning
Magdalena Wysocka, Danilo Carvalho, Oskar Wysocki, Marco Valentino,, Andre Freitas

TL;DR
This paper introduces SylloBio-NLI, a framework for evaluating large language models on biomedical syllogistic reasoning, revealing significant challenges and the impact of prompting techniques on model performance.
Contribution
The paper presents a novel framework for biomedical syllogistic reasoning evaluation and provides extensive analysis of LLMs' capabilities and limitations in this domain.
Findings
Zero-shot LLM accuracy ranges from 70% to 23% across schemes.
Few-shot prompting improves performance significantly.
Models are highly sensitive to superficial lexical variations.
Abstract
Syllogistic reasoning is crucial for Natural Language Inference (NLI). This capability is particularly significant in specialized domains such as biomedicine, where it can support automatic evidence interpretation and scientific discovery. This paper presents SylloBio-NLI, a novel framework that leverages external ontologies to systematically instantiate diverse syllogistic arguments for biomedical NLI. We employ SylloBio-NLI to evaluate Large Language Models (LLMs) on identifying valid conclusions and extracting supporting evidence across 28 syllogistic schemes instantiated with human genome pathways. Extensive experiments reveal that biomedical syllogistic reasoning is particularly challenging for zero-shot LLMs, which achieve an average accuracy between 70% on generalized modus ponens and 23% on disjunctive syllogism. At the same time, we found that few-shot prompting can boost the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
