RxSafeBench: Identifying Medication Safety Issues of Large Language Models in Simulated Consultation
Jiahao Zhao, Luxin Xu, Minghuan Tan, Lichao Zhang, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang

TL;DR
This paper introduces RxSafeBench, a comprehensive benchmark for evaluating medication safety in large language models within simulated clinical consultations, addressing a critical gap in healthcare AI safety assessment.
Contribution
It creates a realistic, high-quality benchmark with a large safety database and evaluates LLMs' ability to recommend safe medications, highlighting current limitations.
Findings
LLMs struggle with contraindication and interaction knowledge.
Risks are harder to detect when implied rather than explicit.
Benchmark enables systematic assessment of medication safety in LLMs.
Abstract
Numerous medical systems powered by Large Language Models (LLMs) have achieved remarkable progress in diverse healthcare tasks. However, research on their medication safety remains limited due to the lack of real world datasets, constrained by privacy and accessibility issues. Moreover, evaluation of LLMs in realistic clinical consultation settings, particularly regarding medication safety, is still underexplored. To address these gaps, we propose a framework that simulates and evaluates clinical consultations to systematically assess the medication safety capabilities of LLMs. Within this framework, we generate inquiry diagnosis dialogues with embedded medication risks and construct a dedicated medication safety database, RxRisk DB, containing 6,725 contraindications, 28,781 drug interactions, and 14,906 indication-drug pairs. A two-stage filtering strategy ensures clinical realism and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
