Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Hongteng Xu; Wenlin Wang; Wei Liu; Lawrence Carin

arXiv:1809.04705·cs.LG·September 14, 2018·33 cites

Distilled Wasserstein Learning for Word Embedding and Topic Modeling

Hongteng Xu, Wenlin Wang, Wei Liu, Lawrence Carin

PDF

Open Access

TL;DR

This paper introduces a novel Wasserstein-based joint learning framework with distillation for word embeddings and topic modeling, improving convergence and performance in clinical data analysis.

Contribution

It presents a unified Wasserstein approach with distillation for simultaneous learning of embeddings and topics, enhancing robustness and clinical application outcomes.

Findings

01

Improved disease network construction

02

Enhanced mortality prediction accuracy

03

Effective procedure recommendation

Abstract

We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying distance in the Wasserstein topic model. The word distributions of topics, their optimal transports to the word distributions of documents, and the embeddings of words are learned in a unified framework. When learning the topic model, we leverage a distilled underlying distance matrix to update the topic distributions and smoothly calculate the corresponding optimal transports. Such a strategy provides the updating of word embeddings with robust guidance, improving the algorithmic convergence. As an application, we focus on patient admission records, in which the proposed method embeds the codes of diseases and procedures and learns the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Radiomics and Machine Learning in Medical Imaging · Topic Modeling