Language Modeling with Sparse Product of Sememe Experts
Yihong Gu, Jun Yan, Hao Zhu, Zhiyuan Liu, Ruobing Xie, Maosong Sun,, Fen Lin, Leyu Lin

TL;DR
This paper introduces SDLM, a sememe-driven language model that predicts words based on their underlying semantic units, enhancing interpretability and robustness over traditional word-based models.
Contribution
The paper proposes a novel sememe-based approach to language modeling, leveraging sememes as semantic experts to improve interpretability and performance.
Findings
SDLM outperforms traditional models in language modeling tasks.
SDLM improves headline generation quality.
Sememe-based modeling enhances model robustness.
Abstract
Most language modeling methods rely on large-scale data to statistically learn the sequential patterns of words. In this paper, we argue that words are atomic language units but not necessarily atomic semantic units. Inspired by HowNet, we use sememes, the minimum semantic units in human languages, to represent the implicit semantics behind words for language modeling, named Sememe-Driven Language Model (SDLM). More specifically, to predict the next word, SDLM first estimates the sememe distribution gave textual context. Afterward, it regards each sememe as a distinct semantic expert, and these experts jointly identify the most probable senses and the corresponding word. In this way, SDLM enables language models to work beyond word-level manipulation to fine-grained sememe-level semantics and offers us more powerful tools to fine-tune language models and improve the interpretability as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsInterpretability
