Text classification using machine learning methods
Bogdan Oancea

TL;DR
This paper explores various machine learning and word embedding techniques for automatic product classification, demonstrating high accuracy with certain models and embedding methods.
Contribution
It compares multiple embedding and classification methods, identifying FASTTEXT and Support Vector Machines as the most effective combination.
Findings
Support Vector Machines, Logistic Regression, and Random Forests achieved high accuracy.
FASTTEXT embedding outperformed other embedding methods.
The approach enables effective automatic classification of products.
Abstract
In this paper we present the results of an experiment aimed to use machine learning methods to obtain models that can be used for the automatic classification of products. In order to apply automatic classification methods, we transformed the product names from a text representation to numeric vectors, a process called word embedding. We used several embedding methods: Count Vectorization, TF-IDF, Word2Vec, FASTTEXT, and GloVe. Having the product names in a form of numeric vectors, we proceeded with a set of machine learning methods for automatic classification: Logistic Regression, Multinomial Naive Bayes, kNN, Artificial Neural Networks, Support Vector Machines, and Decision trees with several variants. The results show an impressive accuracy of the classification process for Support Vector Machines, Logistic Regression, and Random Forests. Regarding the word embedding methods, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
MethodsLogistic Regression · Sparse Evolutionary Training · GloVe Embeddings · fastText
