Development of a European Union Time-Indexed Reference Dataset for Assessing the Performance of Signal Detection Methods in Pharmacovigilance using a Large Language Model
Maria Kefala, Jeffery L. Painter, Syed Tauhid Bukhari, Maurizio Sessa

TL;DR
This study created a time-indexed EU pharmacovigilance dataset with regulatory metadata, enabling better evaluation of signal detection methods by capturing when adverse events are officially recognized.
Contribution
It introduces a novel, comprehensive, time-sensitive dataset for EU drug safety monitoring, incorporating AE inclusion timing and regulatory updates.
Findings
The dataset includes 125,026 drug-AE associations from 1995-2025.
Most adverse events were identified before drug marketing (74.5%).
Safety updates peaked around 2012, indicating regulatory activity trends.
Abstract
Background: The identification of optimal signal detection methods is hindered by the lack of reliable reference datasets. Existing datasets do not capture when adverse events (AEs) are officially recognized by regulatory authorities, preventing restriction of analyses to pre-confirmation periods and limiting evaluation of early detection performance. This study addresses this gap by developing a time-indexed reference dataset for the European Union (EU), incorporating the timing of AE inclusion in product labels along with regulatory metadata. Methods: Current and historical Summaries of Product Characteristics (SmPCs) for all centrally authorized products (n=1,513) were retrieved from the EU Union Register of Medicinal Products (data lock: 15 December 2025). Section 4.8 was extracted and processed using DeepSeek V3 to identify AEs. Regulatory metadata, including labelling changes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
