Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis
Yuanhang Su, Yuzhong Huang, C.-C. Jay Kuo

TL;DR
This paper introduces TMPCA, a novel tree-structured multi-linear PCA technique for reducing text data dimensions, which simplifies classification tasks and outperforms or matches RNNs with lower computational complexity.
Contribution
The paper presents TMPCA, a new dimension reduction method that is more efficient than traditional PCA and enhances text classification performance.
Findings
TMPCA reduces data dimensionality with lower complexity.
SVM with TMPCA achieves comparable or better accuracy than RNNs.
Experimental results validate TMPCA's efficiency and effectiveness.
Abstract
A novel text data dimension reduction technique, called the tree-structured multi-linear principal component anal- ysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demon- strated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Computational Techniques and Applications
