Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Yulu Qin; Dheeraj Varghese; Adam Dahlgren Lindstr\"om; Lucia Donatelli; Kanishka Misra; Najoung Kim

arXiv:2507.13328·cs.CL·October 31, 2025

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It

Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindstr\"om, Lucia Donatelli, Kanishka Misra, Najoung Kim

PDF

Open Access 1 Video

TL;DR

This study investigates how vision-and-language training influences language models, finding it enhances task deployment of taxonomic knowledge without fundamentally altering the underlying representations of this knowledge.

Contribution

The paper demonstrates that VL training improves task-specific application of taxonomic knowledge without significantly changing its core representations.

Findings

01

VL models outperform text-only models on taxonomic question-answering tasks

02

Taxonomic knowledge remains largely unchanged by VL training

03

VL training affects how models represent taxonomic vs. non-taxonomic concepts

Abstract

Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differences, both behaviorally and representationally. In this work, we start from the hypothesis that the domain in which VL training could have a significant effect is lexical-conceptual knowledge, in particular its taxonomic organization. Through comparing minimal pairs of text-only LMs and their VL-trained counterparts, we first show that the VL models often outperform their text-only counterparts on a text-only question-answering task that requires taxonomic understanding of concepts mentioned in the questions. Using an array of targeted behavioral and representational analyses, we show that the LMs and VLMs do not differ significantly in terms of their taxonomic knowledge itself, but they differ in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It· slideslive

Taxonomy

TopicsLexicography and Language Studies · Natural Language Processing Techniques