More Romanian word embeddings from the RETEROM project

Vasile P\u{a}i\c{s}; Dan Tufi\c{s}

arXiv:2111.10750·cs.CL·November 23, 2021·1 cites

More Romanian word embeddings from the RETEROM project

Vasile P\u{a}i\c{s}, Dan Tufi\c{s}

PDF

Open Access

TL;DR

This paper discusses the development of diverse Romanian word embeddings using the RETEROM project, incorporating various linguistic features to enhance natural language processing tasks.

Contribution

It introduces new Romanian word embedding sets with different features, expanding on previous models by including lemmas and POS tags for improved NLP applications.

Findings

01

Existing embeddings based on word occurrences are augmented with lemma and POS features.

02

New embeddings enable better morphological, syntactic, and semantic analysis.

03

Graphical tools are developed for exploring the vector representations.

Abstract

Automatically learned vector representations of words, also known as "word embeddings", are becoming a basic building block for more and more natural language processing algorithms. There are different ways and tools for constructing word embeddings. Most of the approaches rely on raw texts, the construction items being the word occurrences and/or letter n-grams. More elaborated research is using additional linguistic features extracted after text preprocessing. Morphology is clearly served by vector representations constructed from raw texts and letter n-grams. Syntax and semantics studies may profit more from the vector representations constructed with additional features such as lemma, part-of-speech, syntactic or semantic dependants associated with each word. One of the key objectives of the ReTeRom project is the development of advanced technologies for Romanian natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems