inversedMixup: Data Augmentation via Inverting Mixed Embeddings
Fanshuang Kong, Richong Zhang, Qiyu Sun, Zhijie Nie, Ting Deng, Chunming Hu

TL;DR
InversedMixup is a novel data augmentation method that reconstructs interpretable sentences from mixed embeddings by aligning task-specific models with large language models, enhancing augmentation control and effectiveness.
Contribution
It introduces a three-stage training framework for embedding alignment, enabling interpretable mixed-sentence generation and addressing manifold intrusion in text Mixup.
Findings
Effective in few-shot and fully supervised settings
First empirical evidence of manifold intrusion in text Mixup
Improves augmentation quality and interpretability
Abstract
Mixup generates augmented samples by linearly interpolating inputs and labels with a controllable ratio. However, since it operates in the latent embedding level, the resulting samples are not human-interpretable. In contrast, LLM-based augmentation methods produce sentences via prompts at the token level, yielding readable outputs but offering limited control over the generation process. Inspired by recent advances in LLM inversion, which reconstructs natural language from embeddings and helps bridge the gap between latent embedding space and discrete token space, we propose inversedMixup, a unified framework that combines the controllability of Mixup with the interpretability of LLM-based generation. Specifically, inversedMixup adopts a three-stage training procedure to align the output embedding space of a task-specific model with the input embedding space of an LLM. Upon successful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Natural Language Processing Techniques
