Large scale biomedical texts classification: a kNN and an ESA-based approaches
Khadim Dram\'e (UB), Fleur Mougin (UB), Gayo Diallo (UB)

TL;DR
This paper introduces two novel classification methods for large-scale biomedical texts, utilizing kNN with machine learning enhancements and an ESA-based standalone classifier, both evaluated on standard datasets.
Contribution
It presents a kNN-based approach with improved ranking and multiple algorithms, and an ESA-based classifier as a new standalone method for biomedical text classification.
Findings
kNN with Random Forest achieves 0.55% F-measure
ESA-based method shows promising results as a standalone classifier
combination of methods enhances classification performance
Abstract
With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue. MethodsWe propose two classification methods: a k-nearest neighbours (kNN)-based approach and an explicit semantic analysis (ESA)-based approach. Although the kNN-based approach is widely used in text classification, it needs to be improved to perform well in this specific classification problem which deals with partial information. Compared to existing kNN-based methods, our method uses classical Machine Learning (ML) algorithms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
