# Disease Identification From Unstructured User Input

**Authors:** Fahim Faisal (1), Shafkat Ahmed Bhuiyan (1), Abu Raihan Mostofa Kamal, (1) ((1) Islamic University of Technology)

arXiv: 1905.01987 · 2024-09-05

## TL;DR

This paper presents a novel two-phase text classification approach that combines lexicographic, semantic, and symptom-disease correlation features to accurately identify diseases from unstructured user input like health forum posts.

## Contribution

It introduces a new algorithm that extracts comprehensive features from unstructured text to improve disease identification accuracy.

## Key findings

- Effective in extracting features from unstructured data
- Improves disease classification accuracy
- Utilizes symptom-disease correlation for better results

## Abstract

A method to identify probable diseases from the unstructured textual input (eg, health forum posts) by incorporating a lexicographic and semantic feature based two-phase text classification module and a symptom-disease correlation-based similarity measurement module. One notable aspect of my approach was to develop a competent algorithm to extract all inherent features from the data source to make a better decision.

---
Source: https://tomesphere.com/paper/1905.01987