Tree-structured multi-stage principal component analysis (TMPCA): theory and applications
Yuanhang Su, Ruiyuan Lin, C.-C. Jay Kuo

TL;DR
This paper introduces TMPCA, a novel PCA-based method for sequence-to-vector dimension reduction in text classification, which preserves sequence structure without labeled data and outperforms existing neural methods.
Contribution
The paper presents TMPCA, a new sequence-level PCA extension that preserves input sequence structure and improves text classification performance without requiring labeled training data.
Findings
TMPCA preserves sequential structure effectively.
TMPCA outperforms fastText and neural networks in experiments.
TMPCA is computationally efficient for sequence processing.
Abstract
A PCA based sequence-to-vector (seq2vec) dimension reduction method for the text classification problem, called the tree-structured multi-stage principal component analysis (TMPCA) is presented in this paper. Theoretical analysis and applicability of TMPCA are demonstrated as an extension to our previous work (Su, Huang & Kuo). Unlike conventional word-to-vector embedding methods, the TMPCA method conducts dimension reduction at the sequence level without labeled training data. Furthermore, it can preserve the sequential structure of input sequences. We show that TMPCA is computationally efficient and able to facilitate sequence-based text classification tasks by preserving strong mutual information between its input and output mathematically. It is also demonstrated by experimental results that a dense (fully connected) network trained on the TMPCA preprocessed data achieves better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPrincipal Components Analysis · fastText
