NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems
Martyna Wi\k{a}cek, Piotr Rybak, {\L}ukasz Pszenny, Alina Wr\'oblewska

TL;DR
NLPre introduces a language-centric benchmarking system inspired by GLUE, enabling fair, comprehensive evaluation of NLP preprocessing tools across multiple languages, demonstrated through Polish and adaptable for others.
Contribution
It proposes a novel, fair evaluation framework for NLP preprocessing tools, addressing limitations of existing methods and providing a customizable, language-agnostic benchmarking system.
Findings
Extensive evaluation of Polish NLPre tools using the benchmark.
Demonstrated the system's adaptability for other languages like Irish and Chinese.
Provided publicly available resources for benchmarking and tool evaluation.
Abstract
With the advancements of transformer-based architectures, we observe the rise of natural language preprocessing (NLPre) tools capable of solving preliminary NLP tasks (e.g. tokenisation, part-of-speech tagging, dependency parsing, or morphological analysis) without any external linguistic guidance. It is arduous to compare novel solutions to well-entrenched preprocessing toolkits, relying on rule-based morphological analysers or dictionaries. Aware of the shortcomings of existing NLPre evaluation approaches, we investigate a novel method of reliable and fair evaluation and performance reporting. Inspired by the GLUE benchmark, the proposed language-centric benchmarking system enables comprehensive ongoing evaluation of multiple NLPre tools, while credibly tracking their performance. The prototype application is configured for Polish and integrated with the thoroughly assembled NLPre-PL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttentive Walk-Aggregating Graph Neural Network
