BioPars: A Pretrained Biomedical Large Language Model for Persian Biomedical Text Mining
Baqer M. Merzah, Tania Taami, Salman Asoudeh, Saeed Mirzaee, Amir reza Hossein pour, and Amir Ali Bengari

TL;DR
BioPars is a pioneering Persian biomedical language model that leverages a new dataset and evaluation framework to improve medical question answering and understanding in the Persian language, outperforming existing models.
Contribution
Introduction of BioPars, the first Persian biomedical LLM, along with a new dataset and evaluation benchmarks for medical question answering.
Findings
BioPars outperforms GPT-4 1.0 on BioParsQA with a ROUGE-L score of 29.99.
BioPars achieves a BERTScore of 90.87 using MMR.
The model shows promising results but highlights the need for further fine-tuning.
Abstract
Large Language Models (LLMs) have recently gained attention in the life sciences due to their capacity to model, extract, and apply complex biological information. Beyond their classical use as chatbots, these systems are increasingly used for complex analysis and problem-solving in specialized fields, including bioinformatics. First, we introduce BIOPARS-BENCH, a dataset from over 10,000 scientific articles, textbooks, and medical websites. BioParsQA was also introduced to evaluate the proposed model, which consists of 5,231 Persian medical questions and answers. This study then introduces BioPars, a simple but accurate measure designed to assess LLMs for three main abilities: acquiring subject-specific knowledge, interpreting and synthesizing such knowledge, and demonstrating proper evidence. Comparing ChatGPT, Llama, and Galactica, our study highlights their ability to remember and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Biomedical Text Mining and Ontologies · Topic Modeling
