An Efficient Classification Model for Cyber Text

Md Sakhawat Hossen; Md. Zashid Iqbal Borshon; A. S. M. Badrudduza

arXiv:2511.03107·cs.LG·November 6, 2025

An Efficient Classification Model for Cyber Text

Md Sakhawat Hossen, Md. Zashid Iqbal Borshon, A. S. M. Badrudduza

PDF

Open Access

TL;DR

This paper proposes a modified TF-IDF algorithm called CTF-IDF and uses classical machine learning with dimensionality reduction to create a more efficient, less resource-intensive text classification model with comparable accuracy.

Contribution

It introduces CTF-IDF and combines it with IRLBA for dimensionality reduction, enhancing efficiency and reducing computational costs in text classification.

Findings

01

Significant reduction in training time

02

Improved model accuracy with classical methods

03

Lower carbon footprint compared to deep learning

Abstract

The uprising of deep learning methodology and practice in recent years has brought about a severe consequence of increasing carbon footprint due to the insatiable demand for computational resources and power. The field of text analytics also experienced a massive transformation in this trend of monopolizing methodology. In this paper, the original TF-IDF algorithm has been modified, and Clement Term Frequency-Inverse Document Frequency (CTF-IDF) has been proposed for data preprocessing. This paper primarily discusses the effectiveness of classical machine learning techniques in text analytics with CTF-IDF and a faster IRLBA algorithm for dimensionality reduction. The introduction of both of these techniques in the conventional text analytics pipeline ensures a more efficient, faster, and less computationally intensive application when compared with deep learning methodology regarding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Text and Document Classification Technologies · Advanced Graph Neural Networks