NOWJ @BioCreative IX ToxHabits: An Ensemble Deep Learning Approach for Detecting Substance Use and Contextual Information in Clinical Texts

Huu-Huy-Hoang Tran; Gia-Bao Duong; Quoc-Viet-Anh Tran; Thi-Hai-Yen Vuong; Hoang-Quynh Le

arXiv:2602.09469·cs.CL·February 11, 2026

NOWJ @BioCreative IX ToxHabits: An Ensemble Deep Learning Approach for Detecting Substance Use and Contextual Information in Clinical Texts

Huu-Huy-Hoang Tran, Gia-Bao Duong, Quoc-Viet-Anh Tran, Thi-Hai-Yen Vuong, Hoang-Quynh Le

PDF

Open Access

TL;DR

This paper introduces an ensemble deep learning system for detecting substance use and related context in Spanish clinical texts, addressing challenges in clinical NLP with low-resource data and improving detection accuracy.

Contribution

The paper presents a novel multi-output ensemble approach combining BETO and CRF for substance use detection in Spanish clinical texts, advancing low-resource clinical NLP methods.

Findings

01

Achieved 0.94 F1 and 0.97 precision in trigger detection.

02

Attained 0.91 F1 in argument detection.

03

Demonstrated effectiveness in low-resource clinical NLP.

Abstract

Extracting drug use information from unstructured Electronic Health Records remains a major challenge in clinical Natural Language Processing. While Large Language Models demonstrate advancements, their use in clinical NLP is limited by concerns over trust, control, and efficiency. To address this, we present NOWJ submission to the ToxHabits Shared Task at BioCreative IX. This task targets the detection of toxic substance use and contextual attributes in Spanish clinical texts, a domain-specific, low-resource setting. We propose a multi-output ensemble system tackling both Subtask 1 - ToxNER and Subtask 2 - ToxUse. Our system integrates BETO with a CRF layer for sequence labeling, employs diverse training strategies, and uses sentence filtering to boost precision. Our top run achieved 0.94 F1 and 0.97 precision for Trigger Detection, and 0.91 F1 for Argument Detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Text Readability and Simplification