Large scale biomedical texts classification: a kNN and an ESA-based   approaches

Khadim Dram\'e (UB); Fleur Mougin (UB); Gayo Diallo (UB)

arXiv:1606.02976·cs.IR·June 10, 2016

Large scale biomedical texts classification: a kNN and an ESA-based approaches

Khadim Dram\'e (UB), Fleur Mougin (UB), Gayo Diallo (UB)

PDF

TL;DR

This paper introduces two novel classification methods for large-scale biomedical texts, utilizing kNN with machine learning enhancements and an ESA-based standalone classifier, both evaluated on standard datasets.

Contribution

It presents a kNN-based approach with improved ranking and multiple algorithms, and an ESA-based classifier as a new standalone method for biomedical text classification.

Findings

01

kNN with Random Forest achieves 0.55% F-measure

02

ESA-based method shows promising results as a standalone classifier

03

combination of methods enhances classification performance

Abstract

With the large and increasing volume of textual data, automated methods for identifying significant topics to classify textual documents have received a growing interest. While many efforts have been made in this direction, it still remains a real challenge. Moreover, the issue is even more complex as full texts are not always freely available. Then, using only partial information to annotate these documents is promising but remains a very ambitious issue. MethodsWe propose two classification methods: a k-nearest neighbours (kNN)-based approach and an explicit semantic analysis (ESA)-based approach. Although the kNN-based approach is widely used in text classification, it needs to be improved to perform well in this specific classification problem which deals with partial information. Compared to existing kNN-based methods, our method uses classical Machine Learning (ML) algorithms for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.