Visually-augmented pretrained language models for NLP tasks without images
Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Qinyu Zhang, and Ji-Rong Wen

TL;DR
This paper introduces VAWI, a novel method to enhance pretrained language models with visual semantics without using images, leading to improved performance across multiple NLP tasks.
Contribution
The paper proposes VAWI, a visual augmentation technique that improves PLMs without relying on images, addressing limitations of existing visual knowledge integration methods.
Findings
Consistently improves BERT, RoBERTa, BART, and T5 performance
Outperforms several baselines on ten NLP tasks
Applicable to various PLMs and tasks without image retrieval or generation
Abstract
Although pre-trained language models~(PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel \textbf{V}isually-\textbf{A}ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, \textbf{W}ithout using any retrieved or generated \textbf{I}mages, namely \textbf{VAWI}. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Inverse Square Root Schedule · Multi-Head Attention · Residual Connection · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Adam · WordPiece
