Clinical Concept Embeddings Learned from Massive Sources of Multimodal   Medical Data

Andrew L. Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin; Weber; Nathan P. Palmer; Xu Shi; Tianxi Cai; Isaac S. Kohane

arXiv:1804.01486·cs.CL·August 21, 2019

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data

Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin, Weber, Nathan P. Palmer, Xu Shi, Tianxi Cai, Isaac S. Kohane

PDF

4 Repos

TL;DR

This paper introduces cui2vec, a large set of medical concept embeddings learned from multimodal data sources, demonstrating superior performance and providing tools for research and exploration.

Contribution

It presents the largest medical concept embeddings learned from multimodal data and introduces a new benchmark for evaluating such embeddings.

Findings

01

Achieved state-of-the-art performance on medical concept embedding tasks.

02

Created the largest set of embeddings for over 108,000 medical concepts.

03

Provided accessible tools and pre-trained embeddings for the research community.

Abstract

Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a collection of 20 million clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. To evaluate our approach, we present a new benchmark methodology based on statistical power specifically designed to test embeddings of medical concepts. Our approach, called cui2vec, attains state-of-the-art performance relative to previous methods in most instances.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.