# A Comparison of Word-based and Context-based Representations for   Classification Problems in Health Informatics

**Authors:** Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina, MacIntyre

arXiv: 1906.05468 · 2019-06-14

## TL;DR

This paper compares word-based and context-based text representations for health informatics classification tasks, finding that context-based methods generally outperform word-based ones with a 2-4% accuracy gain.

## Contribution

It provides a systematic comparison of word versus context-based text representations across multiple health-related classification problems, highlighting the superior performance of context-based embeddings.

## Key findings

- Context-based representations outperform word-based ones by 2-4% in accuracy.
- ELMo, Universal Sentence Encoder, Neural-Net Language Model, and FLAIR are more effective than Word2Vec and GloVe.
- Context-based methods improve classification performance in health informatics tasks.

## Abstract

Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.05468/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1906.05468/full.md

---
Source: https://tomesphere.com/paper/1906.05468