The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network
Xiaoran Li, Toshiaki Takano

TL;DR
This paper introduces an unsupervised deep clustering network approach to construct a multilingual sememe knowledge base, reducing reliance on manual annotations and capturing core semantic features across languages.
Contribution
It proposes a novel unsupervised method using deep clustering networks for building sememe KBs applicable to any language, leveraging multilingual word representations.
Findings
Low-dimensional sememe space retains main semantic features
Unsupervised approach reduces manual annotation biases
Method effective across multiple languages
Abstract
A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks, and we believe that by learning the smallest unit of meaning, computers can more easily understand human language. However, Existing sememe KBs are built on only manual annotation, human annotations have personal understanding biases, and the meaning of vocabulary will be constantly updated and changed with the times, and artificial methods are not always practical. To address the issue, we propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB, and you can use any language to build a KB through this method. We first learn the distributed representation of multilingual words, use MUSE to align them in a single vector space, learn the multi-layer meaning of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsALIGN
