ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language
Franciszek G\'orski, Andrzej Czy\.zewski

TL;DR
This paper introduces ADMEDTAGGER, a framework that uses multilingual LLMs to distill expert knowledge for annotating Polish medical texts, resulting in efficient classifiers with high accuracy.
Contribution
It presents a novel annotation framework leveraging multilingual LLMs to generate training data for Polish medical text classification, reducing resource requirements.
Findings
DistilBERT achieved F1 > 0.80 across categories.
Models are significantly smaller and faster than large language models.
Effective classifiers were developed with limited annotation resources.
Abstract
In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is part of a larger project called ADMEDVOICE, within which we collected an extensive corpus of medical texts representing five clinical categories - Radiology, Oncology, Cardiology, Hypertension, and Pathology. Using this data, we had to develop a multi-class classifier, but the fundamental problem turned out to be the lack of resources for annotating an adequate number of texts. Therefore, in our solution, we used the multilingual Llama3.1 model to annotate an extensive corpus of medical texts in Polish. Using our limited annotation resources, we verified only a portion of these labels, creating a test set from them. The data annotated in this way were…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Biomedical Text Mining and Ontologies · Artificial Intelligence in Healthcare and Education
