DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data   Perturbations and MinMax Training

Bhuvanesh Verma; Lisa Raithel

arXiv:2405.00321·cs.CL·May 2, 2024

DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training

Bhuvanesh Verma, Lisa Raithel

PDF

Open Access 1 Video

TL;DR

This paper presents a robust LLM-based approach for natural language inference on clinical trial reports, employing data perturbations and MinMax training to improve reasoning on complex, domain-specific texts.

Contribution

It introduces a novel data augmentation and training strategy using perturbations and MinMax training with the Mistral model for improved robustness in clinical NLP tasks.

Findings

01

Enhanced model robustness to semantic and numerical perturbations

02

Identification of challenging sections in clinical trial reports for reasoning

03

Effective handling of contradictions in domain-specific texts

Abstract

The NLI4CT task at SemEval-2024 emphasizes the development of robust models for Natural Language Inference on Clinical Trial Reports (CTRs) using large language models (LLMs). This edition introduces interventions specifically targeting the numerical, vocabulary, and semantic aspects of CTRs. Our proposed system harnesses the capabilities of the state-of-the-art Mistral model, complemented by an auxiliary model, to focus on the intricate input space of the NLI4CT dataset. Through the incorporation of numerical and acronym-based perturbations to the data, we train a robust system capable of handling both semantic-altering and numerical contradiction interventions. Our analysis on the dataset sheds light on the challenging sections of the CTRs for reasoning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DFKI-NLP at SemEval-2024 Task 2: Towards Robust LLMs Using Data Perturbations and MinMax Training· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques

MethodsFocus