Experts are all you need: A Composable Framework for Large Language Model Inference
Shrihari Sridharan, Sourjya Roy, Anand Raghunathan, Kaushik Roy

TL;DR
Comp-LLM introduces a composable inference framework for large language models that enhances reasoning, reduces model size, and improves latency by enabling cross-expert collaboration through a dependency graph.
Contribution
It presents a novel framework that decomposes queries into sub-tasks, assigns them to experts, and efficiently combines responses, addressing limitations of MoEs and multi-agent systems.
Findings
Up to 11.01% accuracy improvement over monolithic LLMs.
Model size reduced by 1.67x to 3.56x without significant performance loss.
Latency improved by 1.1x to 1.7x compared to sequential sub-query processing.
Abstract
Large Language Models (LLMs) have achieved state-of-the-art accuracies in a variety of natural language processing (NLP) tasks. However, this success comes at the cost of increased model sizes which leads to additional computational burden. Mixture of Experts (MoEs) overcome this bottleneck by decoupling model capacity from computation by only activating a subset of parameters or "experts". However, these models require joint pretraining of these experts along with the router and do not model multi-step reasoning. In contrast, multi-agent frameworks improve reasoning by decomposing complex problems into modular subtasks. However, these frameworks rely on sequential "plan--act--observe" loops, which introduce significant latency. Our work, Comp-LLM, addresses these challenges by introducing a composable inference framework that enables cross-expert collaboration via an explicit sub-query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
