Exploring the Use of Foundation Models for Named Entity Recognition and   Lemmatization Tasks in Slavic Languages

Gabriela Pa{\l}ka; Artur Nowakowski

arXiv:2304.05336·cs.CL·April 12, 2023·1 cites

Exploring the Use of Foundation Models for Named Entity Recognition and Lemmatization Tasks in Slavic Languages

Gabriela Pa{\l}ka, Artur Nowakowski

PDF

Open Access

TL;DR

This paper explores the application of foundation models like BERT and T5 for named entity recognition and lemmatization in Slavic languages, demonstrating promising results and sharing their models publicly.

Contribution

It introduces a novel approach using foundation models and external datasets for NER and lemmatization in Slavic languages, achieving high performance.

Findings

01

High metric scores in NER and lemmatization tasks

02

Effective use of external datasets to improve model quality

03

Models made publicly available for further research

Abstract

This paper describes Adam Mickiewicz University's (AMU) solution for the 4th Shared Task on SlavNER. The task involves the identification, categorization, and lemmatization of named entities in Slavic languages. Our approach involved exploring the use of foundation models for these tasks. In particular, we used models based on the popular BERT and T5 model architectures. Additionally, we used external datasets to further improve the quality of our models. Our solution obtained promising results, achieving high metrics scores in both tasks. We describe our approach and the results of our experiments in detail, showing that the method is effective for NER and lemmatization in Slavic languages. Additionally, our models for lemmatization will be available at: https://huggingface.co/amu-cai.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Gated Linear Unit · Attention Dropout · Weight Decay · Multi-Head Attention · Linear Warmup With Linear Decay