DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

Jinda Lu; Junkang Wu; Jinghan Li; Xiaojun Jia; Shuo Wang; YiFan Zhang,; Junfeng Fang; Xiang Wang; Xiangnan He

arXiv:2502.01943·cs.CV·February 12, 2025

DAMA: Data- and Model-aware Alignment of Multi-modal LLMs

Jinda Lu, Junkang Wu, Jinghan Li, Xiaojun Jia, Shuo Wang, YiFan Zhang,, Junfeng Fang, Xiang Wang, Xiangnan He

PDF

Open Access 1 Repo

TL;DR

This paper introduces DAMA, a novel approach for aligning multi-modal large language models by dynamically adjusting training based on data difficulty and real-time model responses, leading to improved trustworthiness and effectiveness.

Contribution

DAMA is the first method to incorporate both data hardness and model response awareness for better alignment of multi-modal LLMs.

Findings

01

DAMA significantly reduces hallucinations on Object-HalBench.

02

DAMA outperforms GPT-4V in key alignment metrics.

03

The approach enhances model trustworthiness and task effectiveness.

Abstract

Direct Preference Optimization (DPO) has shown effectiveness in aligning multi-modal large language models (MLLM) with human preferences. However, existing methods exhibit an imbalanced responsiveness to the data of varying hardness, tending to overfit on the easy-to-distinguish data while underfitting on the hard-to-distinguish data. In this paper, we propose Data- and Model-aware DPO (DAMA) to dynamically adjust the optimization process from two key aspects: (1) a data-aware strategy that incorporates data hardness, and (2) a model-aware strategy that integrates real-time model responses. By combining the two strategies, DAMA enables the model to effectively adapt to data with varying levels of hardness. Extensive experiments on five benchmarks demonstrate that DAMA not only significantly enhances the trustworthiness, but also improves the effectiveness over general tasks. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

injadlu/damo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies

MethodsDirect Preference Optimization