The Power of Communities: A Text Classification Model with Automated Labeling Process Using Network Community Detection
Minjun Kim, Hiroki Sayama

TL;DR
This paper introduces a novel network community detection approach to automatically label text data, improving classification accuracy in machine learning models for various NLP applications.
Contribution
It presents a new method using community detection on sentence similarity networks for automatic labeling, enhancing text classification performance.
Findings
Community detection labels outperform human labels by 2.68-3.75% accuracy.
Support Vector Machine and Random Forest models benefit from network-based labels.
Method improves the development of more accurate conversational and text classification systems.
Abstract
Text classification is one of the most critical areas in machine learning and artificial intelligence research. It has been actively adopted in many business applications such as conversational intelligence systems, news articles categorizations, sentiment analysis, emotion detection systems, and many other recommendation systems in our daily life. One of the problems in supervised text classification models is that the models' performance depends heavily on the quality of data labeling that is typically done by humans. In this study, we propose a new network community detection-based approach to automatically label and classify text data into multiclass value spaces. Specifically, we build networks with sentences as the network nodes and pairwise cosine similarities between the Term Frequency-Inversed Document Frequency (TFIDF) vector representations of the sentences as the network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
