IMB: An Italian Medical Benchmark for Question Answering
Antonio Romano, Giuseppe Riccio, Mariano Barone, Marco Postiglione, Vincenzo Moscato

TL;DR
This paper introduces two extensive Italian medical question answering benchmarks, demonstrating that domain-specific models and retrieval techniques outperform larger general models in medical QA tasks.
Contribution
The paper presents IMB-QA and IMB-MCQA benchmarks for Italian medical QA, and shows how specialized models and retrieval methods enhance performance over larger general models.
Findings
Domain-specific models outperform larger general models.
Retrieval augmented generation improves answer accuracy.
Datasets and evaluation tools are publicly available.
Abstract
Online medical forums have long served as vital platforms where patients seek professional healthcare advice, generating vast amounts of valuable knowledge. However, the informal nature and linguistic complexity of forum interactions pose significant challenges for automated question answering systems, especially when dealing with non-English languages. We present two comprehensive Italian medical benchmarks: \textbf{IMB-QA}, containing 782,644 patient-doctor conversations from 77 medical categories, and \textbf{IMB-MCQA}, comprising 25,862 multiple-choice questions from medical specialty examinations. We demonstrate how Large Language Models (LLMs) can be leveraged to improve the clarity and consistency of medical forum data while retaining their original meaning and conversational style, and compare a variety of LLM architectures on both open and multiple-choice question answering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Artificial Intelligence in Healthcare and Education
