Polish-English medical knowledge transfer: A new benchmark and results
{\L}ukasz Grzybowski, Jakub Pokrywka, Micha{\l} Ciesi\'o{\l}ka, Jeremi I. Kaczmarek, Marek Kubis

TL;DR
This paper introduces a new Polish medical exam dataset with English translations, benchmarking various LLMs and revealing their strengths and limitations in medical knowledge transfer across languages.
Contribution
It presents a novel Polish medical exam benchmark dataset with English translations and systematically evaluates LLMs' performance on this resource.
Findings
GPT-4o approaches human performance
Models face challenges in cross-lingual translation
Performance varies across medical specialties
Abstract
Large Language Models (LLMs) have demonstrated significant potential in handling specialized tasks, including medical problem-solving. However, most studies predominantly focus on English-language contexts. This study introduces a novel benchmark dataset based on Polish medical licensing and specialization exams (LEK, LDEK, PES) taken by medical doctor candidates and practicing doctors pursuing specialization. The dataset was web-scraped from publicly available resources provided by the Medical Examination Center and the Chief Medical Chamber. It comprises over 24,000 exam questions, including a subset of parallel Polish-English corpora, where the English portion was professionally translated by the examination center for foreign candidates. By creating a structured benchmark from these existing exam questions, we systematically evaluate state-of-the-art LLMs, including general-purpose,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Text Readability and Simplification
MethodsFocus
