InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Boyuan Chen; Donghai Hong; Jiaming Ji; Jiacheng Zheng; Bowen Dong; Jiayi Zhou; Kaile Wang; Juntao Dai; Xuyao Wang; Wenqi Chen; Qirui Zheng; Wenxin Li; Sirui Han; Yike Guo; Yaodong Yang

arXiv:2505.23950·cs.AI·December 23, 2025

InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback

Boyuan Chen, Donghai Hong, Jiaming Ji, Jiacheng Zheng, Bowen Dong, Jiayi Zhou, Kaile Wang, Juntao Dai, Xuyao Wang, Wenqi Chen, Qirui Zheng, Wenxin Li, Sirui Han, Yike Guo, Yaodong Yang

PDF

1 Datasets

TL;DR

This paper introduces InterMT, a dataset and benchmark for multi-turn, multimodal interaction aligned with human preferences, aiming to enhance multimodal large models' interactive capabilities.

Contribution

It presents the first preference dataset for multi-turn multimodal interaction, incorporating human feedback and expert annotations, and introduces tools and benchmarks to evaluate and improve MLLMs' interactive abilities.

Findings

01

InterMT dataset contains 15.6k prompts and 52.6k dialogue instances.

02

InterMT-Bench evaluates MLLMs' multi-turn multimodal assistance.

03

Open-source data facilitates future research in multimodal alignment.

Abstract

As multimodal large models (MLLMs) continue to advance across challenging tasks, a key question emerges: What essential capabilities are still missing? A critical aspect of human learning is continuous interaction with the environment -- not limited to language, but also involving multimodal understanding and generation. To move closer to human-level intelligence, models must similarly support multi-turn, multimodal interaction. In particular, they should comprehend interleaved multimodal contexts and respond coherently in ongoing exchanges. In this work, we present an initial exploration through the InterMT -- the first preference dataset for multi-turn multimodal interaction, grounded in real human feedback. In this exploration, we particularly emphasize the importance of human oversight, introducing expert annotations to guide the process, motivated by the fact that current MLLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

PKU-Alignment/InterMT
dataset· 232 dl
232 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.