Efficient LLM Collaboration via Planning
Byeongchan Lee, Jonghoon Lee, Dongyoung Kim, Jaehyung Kim, Kyungjoon Park, Dongjun Lee, Jinwoo Shin

TL;DR
The paper introduces COPE, a planning-based collaboration framework enabling small and large language models to work together efficiently, achieving high performance at reduced inference costs across various tasks.
Contribution
It presents a novel test-time collaboration method where models alternate roles as planner and executor, significantly lowering costs while maintaining performance.
Findings
COPE matches large model performance on multiple benchmarks.
The framework reduces inference API costs drastically.
Multi-stage planning improves task-solving efficiency.
Abstract
Recently, large language models (LLMs) have demonstrated strong performance, ranging from simple to complex tasks. However, while large models achieve remarkable results across diverse tasks, they often incur substantial monetary inference cost, making frequent use impractical for many applications. In contrast, small models are often freely available and easy to deploy locally, but their performance on complex tasks remains limited. This trade-off raises a natural question: how can small and large models efficiently collaborate to combine their complementary strengths? To bridge this trade-off, we propose COPE, a test-time collaboration framework. A planner model first generates a plan that serves as a lightweight intermediate that guides a downstream executor model. Small and large models take turns acting as planner and executor, exchanging plans in a multi-stage cascade to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
