Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning
Neale Ratzlaff, Man Luo, Xin Su, Vasudev Lal, Phillip Howard

TL;DR
This paper investigates how multimodal instruction tuning affects language reasoning in large language models and proposes a training-free method to mitigate reasoning degradation, improving performance across various tasks.
Contribution
It reveals the varied effects of multimodal tuning on language reasoning and introduces a training-free model merging technique to counteract reasoning performance loss.
Findings
Multimodal tuning degrades Mistral's language reasoning but improves Vicuna's.
Mathematical reasoning performance declines, while commonsense reasoning improves after multimodal tuning.
A training-free model merging method mitigates reasoning degradation and enhances visual task performance.
Abstract
Multimodal models typically combine a powerful large language model (LLM) with a vision encoder and are then trained on multimodal data via instruction tuning. While this process adapts LLMs to multimodal settings, it remains unclear whether this adaptation compromises their original language reasoning capabilities. In this work, we explore the effects of multimodal instruction tuning on language reasoning performance. We focus on LLaVA, a leading multimodal framework that integrates LLMs such as Vicuna or Mistral with the CLIP vision encoder. We compare the performance of the original LLMs with their multimodal-adapted counterparts across eight language reasoning tasks. Our experiments yield several key insights. First, the impact of multimodal learning varies between Vicuna and Mistral: we observe a degradation in language reasoning for Mistral but improvements for Vicuna across most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Neurobiology of Language and Bilingualism
MethodsContrastive Language-Image Pre-training · Focus
