Short Text Classification Improved by Feature Space Extension
Yanxuan Li

TL;DR
This paper introduces TB-CNN, a novel approach combining LDA and CNN to extend feature space, effectively addressing the sparsity challenge in short text classification.
Contribution
It proposes a new topic-based CNN model that integrates LDA-generated topic words to improve short text classification performance.
Findings
TB-CNN outperforms traditional CNN methods on IMDB dataset.
Extending feature space with topic words enhances classification accuracy.
The approach effectively reduces sparsity issues in short text data.
Abstract
With the explosive development of mobile Internet, short text has been applied extensively. The difference between classifying short text and long documents is that short text is of shortness and sparsity. Thus, it is challenging to deal with short text classification owing to its less semantic information. In this paper, we propose a novel topic-based convolutional neural network (TB-CNN) based on Latent Dirichlet Allocation (LDA) model and convolutional neural network. Comparing to traditional CNN methods, TB-CNN generates topic words with LDA model to reduce the sparseness and combines the embedding vectors of topic words and input words to extend feature space of short text. The validation results on IMDB movie review dataset show the improvement and effectiveness of TB-CNN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
