LIDIOMS: A Multilingual Linked Idioms Data Set
Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, and Axel-Cyrille Ngonga Ngomo

TL;DR
LIDIOMS is a multilingual RDF dataset linking idioms across five languages, supporting NLP applications by providing high-quality, evaluated idiom links integrated with existing multilingual resources.
Contribution
The paper introduces a new multilingual idiom dataset with a robust structure, quality evaluation, and links to established linguistic data sets, enhancing NLP resources.
Findings
Contains idioms in five languages with evaluated quality
Links idioms to BabelNet and other datasets
Follows best practices in linguistic linked data
Abstract
In this paper, we describe the LIDIOMS data set, a multilingual RDF representation of idioms currently containing five languages: English, German, Italian, Portuguese, and Russian. The data set is intended to support natural language processing applications by providing links between idioms across languages. The underlying data was crawled and integrated from various sources. To ensure the quality of the crawled data, all idioms were evaluated by at least two native speakers. Herein, we present the model devised for structuring the data. We also provide the details of linking LIDIOMS to well-known multilingual data sets such as BabelNet. The resulting data set complies with best practices according to Linguistic Linked Open Data Community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
