Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Huanjin Yao, Jiaxing Huang, Wenhao Wu, Jingyi Zhang, Yibo Wang, Shunyu, Liu, Yingjie Wang, Yuxin Song, Haocheng Feng, Li Shen, Dacheng Tao

TL;DR
This paper introduces CoMCTS, a novel collective learning method for multimodal large language models, enabling explicit reasoning and reflection, demonstrated through the Mulberry-260k dataset and superior benchmark performance.
Contribution
We propose CoMCTS, a collective Monte Carlo Tree Search method, and create Mulberry-260k, a dataset for training MLLMs with explicit reasoning and reflection capabilities.
Findings
CoMCTS improves reasoning accuracy on benchmarks.
Mulberry-260k enables effective training of reasoning models.
Our models outperform existing methods on multiple tasks.
Abstract
In this work, we aim to develop an MLLM that understands and solves questions by learning to create each intermediate step of the reasoning involved till the final answer. To this end, we propose Collective Monte Carlo Tree Search (CoMCTS), a new learning-to-reason method for MLLMs, which introduces the concept of collective learning into ``tree search'' for effective and efficient reasoning-path searching and learning. The core idea of CoMCTS is to leverage collective knowledge from multiple models to collaboratively conjecture, search and identify effective reasoning paths toward correct answers via four iterative operations including Expansion, Simulation and Error Positioning, Backpropagation, and Selection. Using CoMCTS, we construct Mulberry-260k, a multimodal dataset with a tree of rich, explicit and well-defined reasoning nodes for each question. With Mulberry-260k, we perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsShrink and Fine-Tune
