TL;DR
This paper explores fine-tuning a Mistral-7B LLM with data augmentation for biomedical NLI in clinical trials, achieving notable macro F1 scores but facing challenges in faithfulness and consistency.
Contribution
It introduces a prompt and fine-tuning approach for Mistral-7B on biomedical NLI, demonstrating the model's potential and limitations in clinical trial statement classification.
Findings
Achieved notable macro F1-score in NLI4CT task
Identified limitations in faithfulness and consistency
Developed publicly available code on GitHub
Abstract
This paper describes our approach to the SemEval-2024 safe biomedical Natural Language Inference for Clinical Trials (NLI4CT) task, which concerns classifying statements about Clinical Trial Reports (CTRs). We explored the capabilities of Mistral-7B, a generalist open-source Large Language Model (LLM). We developed a prompt for the NLI4CT task, and fine-tuned a quantized version of the model using an augmented version of the training dataset. The experimental results show that this approach can produce notable results in terms of the macro F1-score, while having limitations in terms of faithfulness and consistency. All the developed code is publicly available on a GitHub repository
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
