DART: A Structured Dataset of Regulatory Drug Documents in Italian for Clinical NLP
Mariano Barone, Antonio Laudante, Giuseppe Riccio, Antonio Romano, Marco Postiglione, Vincenzo Moscato

TL;DR
This paper introduces DART, a structured Italian dataset of regulatory drug documents, enabling improved clinical NLP tasks like drug interaction inference using large language models.
Contribution
DART is the first structured Italian corpus of regulatory drug documents, created through a reproducible pipeline and validated with an LLM-based drug interaction checker.
Findings
LLMs can accurately infer drug interactions using DART
DART enables extraction of key pharmacological information
The dataset supports clinical NLP applications in Italian healthcare
Abstract
The extraction of pharmacological knowledge from regulatory documents has become a key focus in biomedical natural language processing, with applications ranging from adverse event monitoring to AI-assisted clinical decision support. However, research in this field has predominantly relied on English-language corpora such as DrugBank, leaving a significant gap in resources tailored to other healthcare systems. To address this limitation, we introduce DART (Drug Annotation from Regulatory Texts), the first structured corpus of Italian Summaries of Product Characteristics derived from the official repository of the Italian Medicines Agency (AIFA). The dataset was built through a reproducible pipeline encompassing web-scale document retrieval, semantic segmentation of regulatory sections, and clinical summarization using a few-shot-tuned large language model with low-temperature decoding.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Machine Learning in Healthcare
