Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindstr\"om, Lucia Donatelli, Kanishka Misra, Najoung Kim

TL;DR
This study investigates how vision-and-language training influences language models, finding it enhances task deployment of taxonomic knowledge without fundamentally altering the underlying representations of this knowledge.
Contribution
The paper demonstrates that VL training improves task-specific application of taxonomic knowledge without significantly changing its core representations.
Findings
VL models outperform text-only models on taxonomic question-answering tasks
Taxonomic knowledge remains largely unchanged by VL training
VL training affects how models represent taxonomic vs. non-taxonomic concepts
Abstract
Does vision-and-language (VL) training change the linguistic representations of language models in meaningful ways? Most results in the literature have shown inconsistent or marginal differences, both behaviorally and representationally. In this work, we start from the hypothesis that the domain in which VL training could have a significant effect is lexical-conceptual knowledge, in particular its taxonomic organization. Through comparing minimal pairs of text-only LMs and their VL-trained counterparts, we first show that the VL models often outperform their text-only counterparts on a text-only question-answering task that requires taxonomic understanding of concepts mentioned in the questions. Using an array of targeted behavioral and representational analyses, we show that the LMs and VLMs do not differ significantly in terms of their taxonomic knowledge itself, but they differ in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLexicography and Language Studies · Natural Language Processing Techniques
