HealthE: Classifying Entities in Online Textual Health Advice
Joseph Gatto, Parker Seegmiller, Garrett Johnston, Sarah M. Preum

TL;DR
This paper introduces HealthE, a new dataset of health advice entities, and EP S-BERT, a model that significantly improves health entity classification accuracy over existing tools, aiding medical NLP applications.
Contribution
The paper presents a novel annotated dataset, HealthE, with detailed health advice annotations, and a new classification model, EP S-BERT, that leverages context for better accuracy.
Findings
EP S-BERT outperforms baseline models with a 4-point F1 increase.
HealthE dataset covers diverse health phrases with granular labels.
EP S-BERT achieves a 34-point F1 improvement over standard medical NER tools.
Abstract
The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, consisting of 6,756 health advice. HealthE has a more granular label space compared to existing medical NER corpora and contains annotation for diverse health phrases. Additionally, we introduce a new health entity classification model, EP S-BERT, which leverages textual context patterns in the classification of entity classes. EP S-BERT provides a 4-point increase in F1 score over the nearest baseline and a 34-point increase in F1 when compared to off-the-shelf medical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
