Visually-augmented pretrained language models for NLP tasks without   images

Hangyu Guo; Kun Zhou; Wayne Xin Zhao; Qinyu Zhang; and Ji-Rong Wen

arXiv:2212.07937·cs.CL·May 29, 2023

Visually-augmented pretrained language models for NLP tasks without images

Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Qinyu Zhang, and Ji-Rong Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces VAWI, a novel method to enhance pretrained language models with visual semantics without using images, leading to improved performance across multiple NLP tasks.

Contribution

The paper proposes VAWI, a visual augmentation technique that improves PLMs without relying on images, addressing limitations of existing visual knowledge integration methods.

Findings

01

Consistently improves BERT, RoBERTa, BART, and T5 performance

02

Outperforms several baselines on ten NLP tasks

03

Applicable to various PLMs and tasks without image retrieval or generation

Abstract

Although pre-trained language models~(PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense. Existing solutions often rely on explicit images for visual knowledge augmentation (requiring time-consuming retrieval or generation), and they also conduct the augmentation for the whole input text, without considering whether it is actually needed in specific inputs or tasks. To address these issues, we propose a novel \textbf{V}isually-\textbf{A}ugmented fine-tuning approach that can be generally applied to various PLMs or NLP tasks, \textbf{W}ithout using any retrieved or generated \textbf{I}mages, namely \textbf{VAWI}. Experimental results show that our approach can consistently improve the performance of BERT, RoBERTa, BART, and T5 at different scales, and outperform several competitive baselines on ten tasks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rucaibox/vawi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Inverse Square Root Schedule · Multi-Head Attention · Residual Connection · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax · Adam · WordPiece