Enhancing Multi-Class Disease Classification: Neoplasms, Cardiovascular, Nervous System, and Digestive Disorders Using Advanced LLMs
Ahmed Akib Jawad Karim, Muhammad Zawad Mahmud, Samiha Islam, Aznur, Azam

TL;DR
This study evaluates various pre-trained language models for multi-class disease classification, demonstrating that domain-specific models like BioBERT outperform general models, with XLNet and a custom lightweight model also showing strong results.
Contribution
The paper compares the performance of BioBERT, XLNet, BERT, and a new lightweight model on medical text classification, highlighting the effectiveness of specialized versus general models.
Findings
BioBERT achieved 97% accuracy in medical text classification.
XLNet achieved 96% accuracy, showing good generalizability.
Last-BERT, a lightweight model, achieved 87.10% accuracy.
Abstract
In this research, we explored the improvement in terms of multi-class disease classification via pre-trained language models over Medical-Abstracts-TC-Corpus that spans five medical conditions. We excluded non-cancer conditions and examined four specific diseases. We assessed four LLMs, BioBERT, XLNet, and BERT, as well as a novel base model (Last-BERT). BioBERT, which was pre-trained on medical data, demonstrated superior performance in medical text classification (97% accuracy). Surprisingly, XLNet followed closely (96% accuracy), demonstrating its generalizability across domains even though it was not pre-trained on medical data. LastBERT, a custom model based on the lighter version of BERT, also proved competitive with 87.10% accuracy (just under BERT's 89.33%). Our findings confirm the importance of specialized models such as BioBERT and also support impressions around more general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · SentencePiece · Linear Layer · Multi-Head Attention · Attention Dropout · Dense Connections · Adam · Residual Connection
