Categorical Classification of Book Summaries Using Word Embedding Techniques

Kerem Keskin; M\"umine Kaya Kele\c{s}

arXiv:2507.21058·cs.CL·July 30, 2025

Categorical Classification of Book Summaries Using Word Embedding Techniques

Kerem Keskin, M\"umine Kaya Kele\c{s}

PDF

TL;DR

This paper compares various word embedding and machine learning techniques for classifying Turkish book summaries into categories, highlighting the most effective methods for this language and task.

Contribution

It evaluates and compares the effectiveness of different word embedding methods and classifiers specifically for Turkish text classification.

Findings

01

TF-IDF and One-Hot Encoder with SVM, Naive Bayes, Logistic Regression perform best.

02

Support Vector Machine achieved high accuracy with TF-IDF.

03

Word embedding methods vary in success depending on the classifier and language.

Abstract

In this study, book summaries and categories taken from book sites were classified using word embedding methods, natural language processing techniques and machine learning algorithms. In addition, one hot encoding, Word2Vec and Term Frequency - Inverse Document Frequency (TF-IDF) methods, which are frequently used word embedding methods were used in this study and their success was compared. Additionally, the combination table of the pre-processing methods used is shown and added to the table. Looking at the results, it was observed that Support Vector Machine, Naive Bayes and Logistic Regression Models and TF-IDF and One-Hot Encoder word embedding techniques gave more successful results for Turkish texts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.