MoCo: A One-Stop Shop for Model Collaboration Research

Shangbin Feng; Yuyang Bai; Ziyuan Yang; Yike Wang; Zhaoxuan Tan; Jiajie Yan; Zhenyu Lei; Wenxuan Ding; Weijia Shi; Haojin Wang; Zhenting Qi; Yuru Jiang; Heng Wang; Chengsong Huang; Yu Fei; Jihan Yao; Yilun Du; Luke Zettlemoyer; Yejin Choi; Yulia Tsvetkov

arXiv:2601.21257·cs.CL·April 21, 2026

MoCo: A One-Stop Shop for Model Collaboration Research

Shangbin Feng, Yuyang Bai, Ziyuan Yang, Yike Wang, Zhaoxuan Tan, Jiajie Yan, Zhenyu Lei, Wenxuan Ding, Weijia Shi, Haojin Wang, Zhenting Qi, Yuru Jiang, Heng Wang, Chengsong Huang, Yu Fei, Jihan Yao, Yilun Du, Luke Zettlemoyer, Yejin Choi, Yulia Tsvetkov

PDF

1 Repo

TL;DR

MoCo is a comprehensive Python library that benchmarks and compares 26 model collaboration methods across diverse tasks, demonstrating significant performance improvements over non-collaborative models.

Contribution

This paper introduces MoCo, the first unified toolkit for executing, benchmarking, and analyzing a wide range of model collaboration algorithms at scale.

Findings

01

Most collaboration strategies outperform non-collaborative models in 61% of settings

02

The most effective methods outperform by up to 25.8%

03

Model collaboration benefits include solving problems where single LMs struggle

Abstract

Advancing beyond single monolithic language models (LMs), recent research increasingly recognizes the importance of model collaboration, where multiple LMs collaborate, compose, and complement each other. Existing research on this topic has mostly been disparate and disconnected, from different research communities, and lacks rigorous comparison. To consolidate existing research and establish model collaboration as a school of thought, we present MoCo: a one-stop Python library of executing, benchmarking, and comparing model collaboration algorithms at scale. MoCo features 26 model collaboration methods, spanning diverse levels of cross-model information exchange such as routing, text, logit, and model parameters. MoCo integrates 25 evaluation datasets spanning reasoning, QA, code, safety, and more, while users could flexibly bring their own data. Extensive experiments with MoCo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bunsenfeng/model_collaboration
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.