Information Retrieval in long documents: Word clustering approach for improving Semantics
Paul Mbathe Mekontchou, Armel Fotsoh, Bernabe Batchakui, Eddy Ella

TL;DR
This paper introduces a clustering-based semantic retrieval method for long documents that improves traditional keyword-based approaches by leveraging word embeddings and a novel clustering algorithm, demonstrating significant performance gains.
Contribution
It presents a new clustering algorithm for semantic word grouping and a combined lexical-semantic retrieval model tailored for long document retrieval.
Findings
Significant improvement over classical keyword-based methods.
Effective in both long and short document contexts.
Maintains lexical accuracy while enhancing semantic understanding.
Abstract
In this paper, we propose an alternative to deep neural networks for semantic information retrieval for the case of long documents. This new approach exploiting clustering techniques to take into account the meaning of words in Information Retrieval systems targeting long as well as short documents. This approach uses a specially designed clustering algorithm to group words with similar meanings into clusters. The dual representation (lexical and semantic) of documents and queries is based on the vector space model proposed by Gerard Salton in the vector space constituted by the formed clusters. The originalities of our proposal are at several levels: first, we propose an efficient algorithm for the construction of clusters of semantically close words using word embedding as input, then we define a formula for weighting these clusters, and then we propose a function allowing to combine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Advanced Text Analysis Techniques
