FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

Sylvia Vassileva; Ivan Koychev; Svetla Boytcheva

arXiv:2604.06403·cs.CL·April 9, 2026

FMI@SU ToxHabits: Evaluating LLMs Performance on Toxic Habit Extraction in Spanish Clinical Texts

Sylvia Vassileva, Ivan Koychev, Svetla Boytcheva

PDF

TL;DR

This paper evaluates the performance of large language models, especially GPT-4.1, in recognizing toxic habits in Spanish clinical texts, achieving an F1 score of 0.65.

Contribution

It introduces an approach using LLMs for toxic habit entity recognition in Spanish clinical texts, with optimized prompt strategies.

Findings

01

GPT-4.1's few-shot prompting outperformed other methods.

02

Achieved an F1 score of 0.65 on the test set.

03

Demonstrated feasibility of LLMs in non-English clinical NLP tasks.

Abstract

The paper presents an approach for the recognition of toxic habits named entities in Spanish clinical texts. The approach was developed for the ToxHabits Shared Task. Our team participated in subtask 1, which aims to detect substance use and abuse mentions in clinical case reports and classify them in four categories (Tobacco, Alcohol, Cannabis, and Drug). We explored various methods of utilizing LLMs for the task, including zero-shot, few-shot, and prompt optimization, and found that GPT-4.1's few-shot prompting performed the best in our experiments. Our method achieved an F1 score of 0.65 on the test set, demonstrating a promising result for recognizing named entities in languages other than English.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.