Efficient Text Classification Using Tree-structured Multi-linear   Principal Component Analysis

Yuanhang Su; Yuzhong Huang; C.-C. Jay Kuo

arXiv:1801.06607·cs.CL·February 27, 2018·5 cites

Efficient Text Classification Using Tree-structured Multi-linear Principal Component Analysis

Yuanhang Su, Yuzhong Huang, C.-C. Jay Kuo

PDF

Open Access

TL;DR

This paper introduces TMPCA, a novel tree-structured multi-linear PCA technique for reducing text data dimensions, which simplifies classification tasks and outperforms or matches RNNs with lower computational complexity.

Contribution

The paper presents TMPCA, a new dimension reduction method that is more efficient than traditional PCA and enhances text classification performance.

Findings

01

TMPCA reduces data dimensionality with lower complexity.

02

SVM with TMPCA achieves comparable or better accuracy than RNNs.

03

Experimental results validate TMPCA's efficiency and effectiveness.

Abstract

A novel text data dimension reduction technique, called the tree-structured multi-linear principal component anal- ysis (TMPCA), is proposed in this work. Being different from traditional text dimension reduction methods that deal with the word-level representation, the TMPCA technique reduces the dimension of input sequences and sentences to simplify the following text classification tasks. It is shown mathematically and experimentally that the TMPCA tool demands much lower complexity (and, hence, less computing power) than the ordinary principal component analysis (PCA). Furthermore, it is demon- strated by experimental results that the support vector machine (SVM) method applied to the TMPCA-processed data achieves commensurable or better performance than the state-of-the-art recurrent neural network (RNN) approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Computational Techniques and Applications