Explainable Multi-class Classification of Medical Data

YuanZheng Hu; Marina Sokolova

arXiv:2012.13796·cs.LG·December 29, 2020·1 cites

Explainable Multi-class Classification of Medical Data

YuanZheng Hu, Marina Sokolova

PDF

Open Access

TL;DR

This paper presents an explainable multi-class classification approach for medical data, emphasizing feature engineering, data balancing, and model selection, evaluated on hospital readmission data with improved recall and accuracy.

Contribution

It introduces a comprehensive methodology for explainable multi-class classification in medical datasets, including knowledge-based feature engineering and empirical evaluation of six algorithms.

Findings

01

Using 23 medication features improves recall for most algorithms.

02

Gradient Boosting and Random Forest achieve the highest accuracy.

03

The study expands previous results on the UCI Diabetes dataset.

Abstract

Machine Learning applications have brought new insights into a secondary analysis of medical data. Machine Learning helps to develop new drugs, define populations susceptible to certain illnesses, identify predictors of many common diseases. At the same time, Machine Learning results depend on convolution of many factors, including feature selection, class (im)balance, algorithm preference, and performance metrics. In this paper, we present explainable multi-class classification of a large medical data set. We in details discuss knowledge-based feature engineering, data set balancing, best model selection, and parameter tuning. Six algorithms are used in this study: Support Vector Machine (SVM), Na\"ive Bayes, Gradient Boosting, Decision Trees, Random Forest, and Logistic Regression. Our empirical evaluation is done on the UCI Diabetes 130-US hospitals for years 1999-2008 dataset, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · COVID-19 diagnosis using AI

MethodsLogistic Regression · Convolution