Using Ontologies To Improve Performance In Massively Multi-label   Prediction Models

Ethan Steinberg; Peter J. Liu

arXiv:1905.12126·cs.LG·May 30, 2019·1 cites

Using Ontologies To Improve Performance In Massively Multi-label Prediction Models

Ethan Steinberg, Peter J. Liu

PDF

Open Access

TL;DR

This paper introduces a novel neural network output layer that leverages ontologies to improve prediction accuracy for rare labels in massively multi-label classification tasks, such as disease and protein function prediction.

Contribution

It proposes a Bayesian network of sigmoids that incorporates ontology relationships to enhance learning for rare labels in multi-label models.

Findings

01

Significant improvements in AUROC for rare labels

02

Enhanced average precision for infrequent classes

03

Effective application to disease and protein function prediction

Abstract

Massively multi-label prediction/classification problems arise in environments like health-care or biology where very precise predictions are useful. One challenge with massively multi-label problems is that there is often a long-tailed frequency distribution for the labels, which results in few positive examples for the rare labels. We propose a solution to this problem by modifying the output layer of a neural network to create a Bayesian network of sigmoids which takes advantage of ontology relationships between the labels to help share information between the rare and the more common labels. We apply this method to the two massively multi-label tasks of disease prediction (ICD-9 codes) and protein function prediction (Gene Ontology terms) and obtain significant improvements in per-label AUROC and average precision for less common labels.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Topic Modeling