A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus
Eduard Poesina, Cornelia Caragea, Radu Tudor Ionescu

TL;DR
This paper introduces RoNLI, the first Romanian NLI corpus, and proposes a novel curriculum learning method based on data cartography to improve model performance.
Contribution
The paper presents the creation of the first Romanian NLI dataset and introduces a new curriculum learning strategy utilizing data cartography.
Findings
Multiple baseline models established for Romanian NLI.
Curriculum learning strategy improves model accuracy.
Dataset and code are publicly available.
Abstract
Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding. Despite the relevance of the task in building conversational agents and improving text classification, machine translation and other NLP tasks, to the best of our knowledge, there is no publicly available NLI corpus for the Romanian language. To this end, we introduce the first Romanian NLI corpus (RoNLI) comprising 58K training sentence pairs, which are obtained via distant supervision, and 6K validation and test sentence pairs, which are manually annotated with the correct labels. We conduct experiments with multiple machine learning methods based on distant learning, ranging from shallow models based on word embeddings to transformer-based neural networks, to establish a set of competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsSparse Evolutionary Training
