Web Document Categorization Using Naive Bayes Classifier and Latent Semantic Analysis
Alireza Saleh Sedghpour, Mohammad Reza Saleh Sedghpour

TL;DR
This paper presents a method combining Naive Bayes and Latent Semantic Analysis to improve the accuracy and speed of web document classification, addressing challenges of high dimensionality and semantic relations.
Contribution
It introduces a novel approach integrating LSA with Naive Bayes to enhance classification performance on large-scale web documents.
Findings
Improved classification accuracy and speed.
Enhanced precision and recall metrics.
Better handling of high-dimensional, sparse data.
Abstract
A rapid growth of web documents due to heavy use of World Wide Web necessitates efficient techniques to efficiently classify the document on the web. It is thus produced High volumes of data per second with high diversity. Automatically classification of these growing amounts of web document is One of the biggest challenges facing us today. Probabilistic classification algorithms such as Naive Bayes have become commonly used for web document classification. This problem is mainly because of the irrelatively high classification accuracy on plenty application areas as well as their lack of support to handle high dimensional and sparse data which is the exclusive characteristics of textual data representation. also it is common to Lack of attention and support the semantic relation between words using traditional feature selection method When dealing with the big data and large-scale web…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Web Data Mining and Analysis · Spam and Phishing Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Selection
