Cluster-Based Information Retrieval by using (K-means)- Hierarchical Parallel Genetic Algorithms Approach
Sarah Hussein Toman, Mohammed Hamzah Abed, Zinah Hussein Toman

TL;DR
This paper introduces a novel (K-means)-Hierarchical Parallel Genetic Algorithms approach for cluster-based information retrieval, significantly improving precision and efficiency over traditional IR methods on multiple datasets.
Contribution
It combines K-means clustering with hybrid parallel genetic algorithms to enhance IR quality and processing speed, reducing irrelevant documents in large datasets.
Findings
Precision improved by up to 45% over IR-GA
Significant accuracy gains over classic IR methods
Effective clustering reduces irrelevant document retrieval
Abstract
Cluster-based information retrieval is one of the Information retrieval(IR) tools that organize, extract features and categorize the web documents according to their similarity. Unlike traditional approaches, cluster-based IR is fast in processing large datasets of document. To improve the quality of retrieved documents, increase the efficiency of IR and reduce irrelevant documents from user search. in this paper, we proposed a (K-means) - Hierarchical Parallel Genetic Algorithms Approach (HPGA) that combines the K-means clustering algorithm with hybrid PG of multi-deme and master/slave PG algorithms. K-means uses to cluster the population to k subpopulations then take most clusters relevant to the query to manipulate in a parallel way by the two levels of genetic parallelism, thus, irrelevant documents will not be included in subpopulations, as a way to improve the quality of results.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
