Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets
Fanchao Qi, Liang Chang, Maosong Sun, Sicong Ouyang, Zhiyuan Liu

TL;DR
This paper introduces a method to automatically predict sememes for BabelNet synsets, aiming to create a multilingual sememe knowledge base that enhances NLP applications across languages.
Contribution
It presents a new dataset of sememe annotations for BabelNet synsets and proposes two models for automatic sememe prediction to expand multilingual semantic resources.
Findings
Effective models for sememe prediction demonstrated.
Analysis of factors influencing prediction accuracy.
A new multilingual sememe knowledge base resource.
Abstract
A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread utilization. To address the issue, we propose to build a unified sememe KB for multiple languages based on BabelNet, a multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It manually annotates sememes for over thousand synsets (the entries of BabelNet). Then, we present a novel task of automatic sememe prediction for synsets, aiming to expand the seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative analyses to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Bioinformatics
