Factor Augmented Supervised Learning with Text Embeddings

Zhanye Luo; Yuefeng Han; Xiufan Yu

arXiv:2508.06548·cs.CL·August 12, 2025

Factor Augmented Supervised Learning with Text Embeddings

Zhanye Luo, Yuefeng Han, Xiufan Yu

PDF

Open Access

TL;DR

This paper introduces AEALT, a supervised autoencoder framework that reduces the dimensionality of text embeddings from large language models, improving efficiency and performance across various NLP tasks.

Contribution

The paper presents a novel supervised autoencoder approach for dimension reduction of LLM-generated embeddings, enhancing task-specific performance and modeling nonlinear embedding structures.

Findings

01

AEALT outperforms traditional dimension reduction methods.

02

Significant accuracy improvements in classification and anomaly detection.

03

Demonstrated broad applicability across multiple real-world datasets.

Abstract

Large language models (LLMs) generate text embeddings from text data, producing vector representations that capture the semantic meaning and contextual relationships of words. However, the high dimensionality of these embeddings often impedes efficiency and drives up computational cost in downstream tasks. To address this, we propose AutoEncoder-Augmented Learning with Text (AEALT), a supervised, factor-augmented framework that incorporates dimension reduction directly into pre-trained LLM workflows. First, we extract embeddings from text documents; next, we pass them through a supervised augmented autoencoder to learn low-dimensional, task-relevant latent factors. By modeling the nonlinear structure of complex embeddings, AEALT outperforms conventional deep-learning approaches that rely on raw embeddings. We validate its broad applicability with extensive experiments on classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Computational and Text Analysis Methods