Sememe Prediction: Learning Semantic Knowledge from Unstructured Textual Wiki Descriptions
Wei Li, Xuancheng Ren, Damai Dai, Yunfang Wu, Houfeng Wang, Xu Sun

TL;DR
This paper introduces a novel sequence-to-sequence model for automatically predicting semantic units called sememes from wiki descriptions, significantly improving over baselines and amateur human annotators.
Contribution
It proposes the LD-seq2seq model with a soft loss function for weakly ordered multi-label sememe prediction from textual descriptions.
Findings
LD-seq2seq outperforms all baseline models.
The model surpasses amateur human annotators.
Results demonstrate effective automatic sememe prediction.
Abstract
Huge numbers of new words emerge every day, leading to a great need for representing them with semantic meaning that is understandable to NLP systems. Sememes are defined as the minimum semantic units of human languages, the combination of which can represent the meaning of a word. Manual construction of sememe based knowledge bases is time-consuming and labor-intensive. Fortunately, communities are devoted to composing the descriptions of words in the wiki websites. In this paper, we explore to automatically predict lexical sememes based on the descriptions of the words in the wiki websites. We view this problem as a weakly ordered multi-label task and propose a Label Distributed seq2seq model (LD-seq2seq) with a novel soft loss function to solve the problem. In the experiments, we take a real-world sememe knowledge base HowNet and the corresponding descriptions of the words in Baidu…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
