Towards Building a Multilingual Sememe Knowledge Base: Predicting   Sememes for BabelNet Synsets

Fanchao Qi; Liang Chang; Maosong Sun; Sicong Ouyang; Zhiyuan Liu

arXiv:1912.01795·cs.CL·December 5, 2019·1 cites

Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets

Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang, Zhiyuan Liu

PDF

Open Access 3 Repos

TL;DR

This paper introduces a method to automatically predict sememes for BabelNet synsets, aiming to create a multilingual sememe knowledge base that enhances NLP applications across languages.

Contribution

It presents a new dataset of sememe annotations for BabelNet synsets and proposes two models for automatic sememe prediction to expand multilingual semantic resources.

Findings

01

Effective models for sememe prediction demonstrated.

02

Analysis of factors influencing prediction accuracy.

03

A new multilingual sememe knowledge base resource.

Abstract

A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over $15$ thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics