LIDIOMS: A Multilingual Linked Idioms Data Set

Diego Moussallem; Mohamed Ahmed Sherif; Diego Esteves; Marcos Zampieri; and Axel-Cyrille Ngonga Ngomo

arXiv:1802.08148·cs.CL·February 23, 2018·5 cites

LIDIOMS: A Multilingual Linked Idioms Data Set

Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, and Axel-Cyrille Ngonga Ngomo

PDF

Open Access 1 Repo

TL;DR

LIDIOMS is a multilingual RDF dataset linking idioms across five languages, supporting NLP applications by providing high-quality, evaluated idiom links integrated with existing multilingual resources.

Contribution

The paper introduces a new multilingual idiom dataset with a robust structure, quality evaluation, and links to established linguistic data sets, enhancing NLP resources.

Findings

01

Contains idioms in five languages with evaluated quality

02

Links idioms to BabelNet and other datasets

03

Follows best practices in linguistic linked data

Abstract

In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian. The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. Herein, we present the model devised for structuring the data. We also provide the details of linking LIDIOMS to well-known multilingual data sets such as BabelNet. The resulting data set complies with best practices according to Linguistic Linked Open Data Community.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dice-group/LIdioms
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies