# Classification of radiology reports by modality and anatomy: A   comparative study

**Authors:** Marina Bendersky, Joy Wu, Tanveer Syeda-Mahmood

arXiv: 1812.10818 · 2018-12-31

## TL;DR

This study compares machine learning models for classifying radiology reports by modality and anatomy, finding logistic regression performs best and generalizes well to unseen datasets with high precision.

## Contribution

It demonstrates that simple ML models, especially logistic regression, can effectively classify radiology reports and generalize across datasets, outperforming NLP-based approaches.

## Key findings

- Logistic regression outperforms other models in classification tasks.
- High average precision (>0.9) achieved on unseen datasets.
- Identification of key tokens with high predictive power.

## Abstract

Data labeling is currently a time-consuming task that often requires expert knowledge. In research settings, the availability of correctly labeled data is crucial to ensure that model predictions are accurate and useful. We propose relatively simple machine learning-based models that achieve high performance metrics in the binary and multiclass classification of radiology reports. We compare the performance of these algorithms to that of a data-driven approach based on NLP, and find that the logistic regression classifier outperforms all other models, in both the binary and multiclass classification tasks. We then choose the logistic regression binary classifier to predict chest X-ray (CXR)/ non-chest X-ray (non-CXR) labels in reports from different datasets, unseen during any training phase of any of the models. Even in unseen report collections, the binary logistic regression classifier achieves average precision values of above 0.9. Based on the regression coefficient values, we also identify frequent tokens in CXR and non-CXR reports that are features with possibly high predictive power.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.10818/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1812.10818/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1812.10818/full.md

---
Source: https://tomesphere.com/paper/1812.10818