A Comparative Study on TF-IDF feature Weighting Method and its Analysis using Unstructured Dataset
Mamata Das, Selvakumar K., P.J.A. Alphonse

TL;DR
This study compares TF-IDF and N-Gram feature weighting methods for text classification on unstructured datasets, demonstrating TF-IDF's superior performance across multiple classifiers in sentiment analysis tasks.
Contribution
It provides an empirical analysis of TF-IDF versus N-Gram features, highlighting TF-IDF's effectiveness in improving classification accuracy on unstructured text data.
Findings
TF-IDF outperforms N-Gram in feature extraction.
Maximum accuracy achieved was 93.81% with TF-IDF and Random Forest.
TF-IDF significantly enhances sentiment classification results.
Abstract
Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term Frequency-Inverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features N-Grams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Network Security and Intrusion Detection · Anomaly Detection Techniques and Applications
MethodsLogistic Regression
