Comparing Lexical and Semantic Vector Search Methods When Classifying Medical Documents
Lee Harris

TL;DR
This paper compares lexical and semantic vector search methods for classifying medical documents, finding that traditional lexical approaches can outperform neural semantic methods in accuracy and efficiency.
Contribution
It demonstrates that off-the-shelf semantic vector search may not always be superior to lexical methods for structured medical document classification.
Findings
Lexical vector search achieved higher accuracy than semantic methods.
Semantic search required significantly more computational time.
Traditional lexical methods remain competitive in medical document classification.
Abstract
Classification is a common AI problem, and vector search is a typical solution. This transforms a given body of text into a numerical representation, known as an embedding, and modern improvements to vector search focus on optimising speed and predictive accuracy. This is often achieved through neural methods that aim to learn language semantics. However, our results suggest that these are not always the best solution. Our task was to classify rigidly-structured medical documents according to their content, and we found that using off-the-shelf semantic vector search produced slightly worse predictive accuracy than creating a bespoke lexical vector search model, and that it required significantly more time to execute. These findings suggest that traditional methods deserve to be contenders in the information retrieval toolkit, despite the prevalence and success of neural models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Text and Document Classification Technologies · Topic Modeling
MethodsFocus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
