Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via   Collective Monte Carlo Tree Search

Huanjin Yao; Jiaxing Huang; Wenhao Wu; Jingyi Zhang; Yibo Wang; Shunyu; Liu; Yingjie Wang; Yuxin Song; Haocheng Feng; Li Shen; Dacheng Tao

arXiv:2412.18319·cs.CV·January 3, 2025

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu, Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao

PDF

Open Access 2 Repos 3 Models 1 Datasets

TL;DR

This paper introduces CoMCTS, a novel collective learning method for multimodal large language models, enabling explicit reasoning and reflection, demonstrated through the Mulberry-260k dataset and superior benchmark performance.

Contribution

We propose CoMCTS, a collective Monte Carlo Tree Search method, and create Mulberry-260k, a dataset for training MLLMs with explicit reasoning and reflection capabilities.

Findings

01

CoMCTS improves reasoning accuracy on benchmarks.

02

Mulberry-260k enables effective training of reasoning models.

03

Our models outperform existing methods on multiple tasks.

Abstract

In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose Collective Monte Carlo Tree Search (CoMCTS), a new learning-to-reason method for MLLMs, which introduces the concept of collective learning into ``tree search'' for effective and efficient reasoning-path searching and learning. The core idea of CoMCTS is to leverage collective knowledge from multiple models to collaboratively conjecture, search and identify effective reasoning paths toward correct answers via four iterative operations including Expansion, Simulation and Error Positioning, Backpropagation, and Selection. Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. With Mulberry-260k, we perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

HuanjinYao/Mulberry-SFT
dataset· 181 dl
181 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsShrink and Fine-Tune