Igea: a Decoder-Only Language Model for Biomedical Text Generation in Italian
Tommaso Mario Buonocore, Simone Rancati, Enea Parimbelli

TL;DR
Igea is a novel decoder-only language model tailored for biomedical text generation in Italian, addressing the scarcity of domain-specific models for less-resourced languages and demonstrating strong performance across biomedical and general benchmarks.
Contribution
This paper introduces Igea, the first Italian biomedical decoder-only language model, built on Minerva, with multiple sizes, and evaluated on domain-specific and general benchmarks.
Findings
Igea performs well on biomedical corpora.
Igea retains general knowledge after domain training.
Models are efficient across three sizes.
Abstract
The development of domain-specific language models has significantly advanced natural language processing applications in various specialized fields, particularly in biomedicine. However, the focus has largely been on English-language models, leaving a gap for less-resourced languages such as Italian. This paper introduces Igea, the first decoder-only language model designed explicitly for biomedical text generation in Italian. Built on the Minerva model and continually pretrained on a diverse corpus of Italian medical texts, Igea is available in three model sizes: 350 million, 1 billion, and 3 billion parameters. The models aim to balance computational efficiency and performance, addressing the challenges of managing the peculiarities of medical terminology in Italian. We evaluate Igea using a mix of in-domain biomedical corpora and general-purpose benchmarks, highlighting its efficacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Text Readability and Simplification
MethodsFocus
