Distilled Wasserstein Learning for Word Embedding and Topic Modeling
Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin

TL;DR
This paper introduces a novel Wasserstein-based joint learning framework with distillation for word embeddings and topic modeling, improving convergence and performance in clinical data analysis.
Contribution
It presents a unified Wasserstein approach with distillation for simultaneous learning of embeddings and topics, enhancing robustness and clinical application outcomes.
Findings
Improved disease network construction
Enhanced mortality prediction accuracy
Effective procedure recommendation
Abstract
We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving the algorithmic convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging · Topic Modeling
