Understanding Alignment in Multimodal LLMs: A Comprehensive Study
Elmira Amirloo, Jean-Philippe Fauconnier, Christoph Roesmann,, Christian Kerl, Rinu Boney, Yusu Qian, Zirui Wang, Afshin Dehghan, Yinfei, Yang, Zhe Gan, Peter Grasch

TL;DR
This paper provides a comprehensive analysis of preference alignment techniques in Multimodal Large Language Models, introduces a novel data creation method, and evaluates their impact on model performance and hallucination reduction.
Contribution
It categorizes alignment algorithms, reviews preference datasets, and proposes Bias-Driven Hallucination Sampling (BDHS) as a new data creation approach for better alignment.
Findings
Combining offline and online alignment methods can enhance performance.
BDHS achieves competitive results without extra annotations.
Dataset construction details significantly affect model outcomes.
Abstract
Preference alignment has become a crucial component in enhancing the performance of Large Language Models (LLMs), yet its impact in Multimodal Large Language Models (MLLMs) remains comparatively underexplored. Similar to language models, MLLMs for image understanding tasks encounter challenges like hallucination. In MLLMs, hallucination can occur not only by stating incorrect facts but also by producing responses that are inconsistent with the image content. A primary objective of alignment for MLLMs is to encourage these models to align responses more closely with image information. Recently, multiple works have introduced preference datasets for MLLMs and examined different alignment methods, including Direct Preference Optimization (DPO) and Proximal Policy Optimization (PPO). However, due to variations in datasets, base model types, and alignment methods, it remains unclear which…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper introduces a unique approach to generate preference data for MLLMs by utilizing model biases without human or external model annotations. 2. The paper provides empirical analysis, comparing BDHS with other alignment methods across multiple benchmarks, highlighting its effectiveness and resource efficiency in aligning MLLMs.
1. The proposed data sampling approach partially mitigates hallucination issues in MLLMs but does not completely resolve them. 2. The BDHS method's dependency on hyperparameters, such as mask thresholds, could affect reproducibility across different model implementations.
1、Comprehensive Analysis: The paper provides a detailed comparison of alignment methods, including offline and online strategies, and evaluates their effectiveness using diverse datasets. 2、Novel Data Generation Method: The introduction of BDHS offers a cost-effective alternative to traditional alignment approaches, reducing the need for human annotation or external supervision while maintaining competitive performance.
1、Clarification of Methodological Choices: It would be helpful to better understand why specific thresholds and parameters were chosen for BDHS, such as the similarity score threshold and masking strategy. 2、Generalizability of BDHS: It remains unclear whether BDHS can be effectively applied to models beyond the specific ones studied. Further discussion on its applicability to other MLLMs or domains would strengthen the paper.
1. The study systematically compares offline and online alignment methods, examining their impact on model performance across various metrics like hallucination reduction and response quality. 2. BDHS presents a low-cost, innovative solution to generate preference data, showing competitive results against other data-heavy methods.
1. While the paper examines alignment techniques and datasets, it does not clearly articulate the primary findings from these investigations, which can make it challenging for readers to grasp the significance and implications of the study 2. BDHS demonstrates promising results; however, its effectiveness may differ across various MLLMs and visual tasks. Conducting additional experiments with diverse model architectures would bolster claims regarding its generalizability.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · linguistics and terminology studies · Translation Studies and Practices
MethodsALIGN · Balanced Selection
