SECNLP: A Survey of Embeddings in Clinical Natural Language Processing
Kalyan KS, S Sangeetha

TL;DR
This survey reviews the landscape of embedding techniques in Clinical Natural Language Processing, discussing corpora, models, evaluation methods, challenges, and future directions to advance clinical NLP research.
Contribution
It provides the first comprehensive classification and comparison of clinical embeddings, along with evaluation strategies and future research directions.
Findings
Nine types of clinical embeddings classified and discussed.
Comparison of popular embedding models in clinical NLP.
Identification of challenges and potential solutions in clinical embeddings.
Abstract
Traditional representations like Bag of words are high dimensional, sparse and ignore the order as well as syntactic and semantic information. Distributed vector representations or embeddings map variable length text to dense fixed length vectors as well as capture the prior knowledge which can transferred to downstream tasks. Even though embedding has become de facto standard for representations in deep learning based NLP tasks in both general and clinical domains, there is no survey paper which presents a detailed review of embeddings in Clinical Natural Language Processing. In this survey paper, we discuss various medical corpora and their characteristics, medical codes and present a brief overview as well as comparison of popular embeddings models. We classify clinical embeddings into nine types and discuss each embedding type in detail. We discuss various evaluation methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
