SymDPO: Boosting In-Context Learning of Large Multimodal Models with   Symbol Demonstration Direct Preference Optimization

Hongrui Jia; Chaoya Jiang; Haiyang Xu; Wei Ye; Mengfan Dong; Ming Yan,; Ji Zhang; Fei Huang; Shikun Zhang

arXiv:2411.11909·cs.CV·November 26, 2024

SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization

Hongrui Jia, Chaoya Jiang, Haiyang Xu, Wei Ye, Mengfan Dong, Ming Yan,, Ji Zhang, Fei Huang, Shikun Zhang

PDF

Open Access 1 Repo

TL;DR

SymDPO introduces a novel training approach for large multimodal models that replaces text answers with symbols, enhancing the models' ability to understand and leverage visual context in demonstrations for improved question answering.

Contribution

This paper proposes SymDPO, a new method that improves multimodal models' understanding of visual context by replacing text answers with symbols during training.

Findings

01

Enhanced multimodal understanding demonstrated on multiple benchmarks.

02

Models trained with SymDPO better utilize visual context in demonstrations.

03

Significant improvement in question-answering accuracy with SymDPO.

Abstract

As language models continue to scale, Large Language Models (LLMs) have exhibited emerging capabilities in In-Context Learning (ICL), enabling them to solve language tasks by prefixing a few in-context demonstrations (ICDs) as context. Inspired by these advancements, researchers have extended these techniques to develop Large Multimodal Models (LMMs) with ICL capabilities. However, existing LMMs face a critical issue: they often fail to effectively leverage the visual context in multimodal demonstrations and instead simply follow textual patterns. This indicates that LMMs do not achieve effective alignment between multimodal demonstrations and model outputs. To address this problem, we propose Symbol Demonstration Direct Preference Optimization (SymDPO). Specifically, SymDPO aims to break the traditional paradigm of constructing multimodal demonstrations by using random symbols to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

APiaoG/SymDPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling