ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning

Yeyuan Wang; Dehong Gao; Rujiao Long; Lei Yi; Linbo Jin; Libin Yang; Xiaoyan Cai

arXiv:2505.19100·cs.CL·May 27, 2025

ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning

Yeyuan Wang, Dehong Gao, Rujiao Long, Lei Yi, Linbo Jin, Libin Yang, Xiaoyan Cai

PDF

Open Access

TL;DR

ASPO introduces a sentence-level preference optimization method that improves multimodal model alignment by providing fine-grained supervision, leading to better response quality without extra model complexity.

Contribution

This paper proposes ASPO, a novel adaptive sentence-level preference optimization technique that enhances multimodal model alignment by incorporating fine-grained supervision during training.

Findings

01

ASPO significantly improves multimodal model performance.

02

ASPO achieves better content assessment without extra parameters.

03

ASPO outperforms traditional binary preference optimization methods.

Abstract

Direct Preference Optimization (DPO) has gained significant attention for its simplicity and computational efficiency in aligning large language models (LLMs). Recent advancements have extended DPO to multimodal scenarios, achieving strong performance. However, traditional DPO relies on binary preference optimization, rewarding or penalizing entire responses without considering fine-grained segment correctness, leading to suboptimal solutions. The root of this issue lies in the absence of fine-grained supervision during the optimization process. To address this, we propose Adaptive Sentence-level Preference Optimization (ASPO), which evaluates individual sentences for more precise preference optimization. By dynamically calculating adaptive rewards at the sentence level based on model predictions, ASPO enhances response content assessment without additional models or parameters. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multi-Agent Systems and Negotiation

MethodsSoftmax · Attention Is All You Need · Direct Preference Optimization