MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning

Xiang Yuan; Xu Chu; Xinrong Chen; Haochen Li; Zonghong Dai; Hongcheng Fan; Xiaoyue Yuan; Weiping Li; Tong Mo

arXiv:2603.09478·cs.MM·March 11, 2026

MORE-R1: Guiding LVLM for Multimodal Object-Entity Relation Extraction via Stepwise Reasoning with Reinforcement Learning

Xiang Yuan, Xu Chu, Xinrong Chen, Haochen Li, Zonghong Dai, Hongcheng Fan, Xiaoyue Yuan, Weiping Li, Tong Mo

PDF

Open Access

TL;DR

This paper introduces MORE-R1, a novel LVLM model that employs explicit stepwise reasoning with reinforcement learning to improve multimodal object-entity relation extraction, achieving state-of-the-art results.

Contribution

The paper proposes a new model with a two-stage training process incorporating reinforcement learning and stepwise reasoning for better multimodal relation extraction.

Findings

01

Achieves state-of-the-art performance on the MORE benchmark.

02

Demonstrates significant improvement over existing methods.

03

Effectively models complex reasoning in multimodal scenarios.

Abstract

Multimodal Object-Entity Relation Extraction (MORE) is a challenging task in information extraction research. It aims to identify relations between visual objects and textual entities, requiring complex multimodal understanding and cross-modal reasoning abilities. Existing methods, mainly classification-based or generation-based without reasoning, struggle to handle complex extraction scenarios in the MORE task and suffer from limited scalability and intermediate reasoning transparency. To address these challenges, we propose MORE-R1, a novel model that introduces explicit stepwise reasoning with Reinforcement Learning (RL) to enable Large Vision-Language Model (LVLM) to address the MORE task effectively. MORE-R1 integrates a two-stage training process, including an initial cold-start training stage with Supervised Fine-Tuning (SFT) and a subsequent RL stage for reasoning ability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks