TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
Moon Ye-Bin, Jisoo Kim, Hongyeob Kim, Kilho Son, Tae-Hyun Oh

TL;DR
TextManiA introduces a novel method that leverages pre-trained language models to semantically enrich visual features, improving performance especially in imbalanced and scarce data scenarios.
Contribution
The paper presents a new text-driven manifold augmentation technique that transfers language model representations to enhance visual feature spaces without requiring visual training data.
Findings
Effective in class imbalance scenarios
Enhances visual features with semantic information
Compatible with label mix-based methods
Abstract
We propose TextManiA, a text-driven manifold augmentation method that semantically enriches visual feature spaces, regardless of class distribution. TextManiA augments visual data with intra-class semantic perturbation by exploiting easy-to-understand visually mimetic words, i.e., attributes. This work is built on an interesting hypothesis that general language models, e.g., BERT and GPT, encompass visual information to some extent, even without training on visual training data. Given the hypothesis, TextManiA transfers pre-trained text representation obtained from a well-established large language encoder to a target visual feature space being learned. Our extensive analysis hints that the language encoder indeed encompasses visual information at least useful to augment visual representation. Our experiments demonstrate that TextManiA is particularly powerful in scarce samples with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Handwritten Text Recognition Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Linear Decay · WordPiece · Softmax · Dropout · BERT · Layer Normalization
