Transferring Knowledge from Vision to Language: How to Achieve it and   how to Measure it?

Tobias Norlund; Lovisa Hagstr\"om; Richard Johansson

arXiv:2109.11321·cs.CL·October 1, 2021

Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?

Tobias Norlund, Lovisa Hagstr\"om, Richard Johansson

PDF

1 Datasets

TL;DR

This paper proposes a new evaluation method and architecture to measure and enhance visual knowledge transfer in large language models, aiming to reduce hallucinations and improve factual accuracy.

Contribution

It introduces a novel task and filtering method to evaluate visual knowledge transfer, along with a new model architecture incorporating visual imagination.

Findings

01

The evaluation method effectively measures visual knowledge transfer.

02

The proposed architecture shows promising results in leveraging multimodal knowledge.

03

Models with visual imagination outperform baseline models in knowledge transfer tasks.

Abstract

Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Lo/clip-bert-data
dataset· 26 dl
26 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.