Factor Augmented Supervised Learning with Text Embeddings
Zhanye Luo, Yuefeng Han, Xiufan Yu

TL;DR
This paper introduces AEALT, a supervised autoencoder framework that reduces the dimensionality of text embeddings from large language models, improving efficiency and performance across various NLP tasks.
Contribution
The paper presents a novel supervised autoencoder approach for dimension reduction of LLM-generated embeddings, enhancing task-specific performance and modeling nonlinear embedding structures.
Findings
AEALT outperforms traditional dimension reduction methods.
Significant accuracy improvements in classification and anomaly detection.
Demonstrated broad applicability across multiple real-world datasets.
Abstract
Large language models (LLMs) generate text embeddings from text data, producing vector representations that capture the semantic meaning and contextual relationships of words. However, the high dimensionality of these embeddings often impedes efficiency and drives up computational cost in downstream tasks. To address this, we propose AutoEncoder-Augmented Learning with Text (AEALT), a supervised, factor-augmented framework that incorporates dimension reduction directly into pre-trained LLM workflows. First, we extract embeddings from text documents; next, we pass them through a supervised augmented autoencoder to learn low-dimensional, task-relevant latent factors. By modeling the nonlinear structure of complex embeddings, AEALT outperforms conventional deep-learning approaches that rely on raw embeddings. We validate its broad applicability with extensive experiments on classification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Computational and Text Analysis Methods
