Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral   7B Model and Data Augmentation

Artur Guimar\~aes; Bruno Martins; Jo\~ao Magalh\~aes

arXiv:2408.03127·cs.CL·August 7, 2024

Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation

Artur Guimar\~aes, Bruno Martins, Jo\~ao Magalh\~aes

PDF

1 Repo

TL;DR

This paper explores fine-tuning a Mistral-7B LLM with data augmentation for biomedical NLI in clinical trials, achieving notable macro F1 scores but facing challenges in faithfulness and consistency.

Contribution

It introduces a prompt and fine-tuning approach for Mistral-7B on biomedical NLI, demonstrating the model's potential and limitations in clinical trial statement classification.

Findings

01

Achieved notable macro F1-score in NLI4CT task

02

Identified limitations in faithfulness and consistency

03

Developed publicly available code on GitHub

Abstract

This paper describes our approach to the SemEval-2024 safe biomedical Natural Language Inference for Clinical Trials (NLI4CT) task, which concerns classifying statements about Clinical Trial Reports (CTRs). We explored the capabilities of Mistral-7B, a generalist open-source Large Language Model (LLM). We developed a prompt for the NLI4CT task, and fine-tuned a quantized version of the model using an augmented version of the training dataset. The experimental results show that this approach can produce notable results in terms of the macro F1-score, while having limitations in terms of faithfulness and consistency. All the developed code is publicly available on a GitHub repository

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

araag2/SemEval2024-Task2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.