Fusing Models with Complementary Expertise
Hongyi Wang, Felipe Maia Polo, Yuekai Sun, Souvik Kundu, Eric Xing,, Mikhail Yurochkin

TL;DR
This paper introduces a supervised learning approach to fuse outputs of multiple expert models with complementary knowledge, improving performance across various AI tasks and enabling a frugal setting to reduce model evaluations.
Contribution
It formulates the Fusion of Experts problem as supervised learning, applicable to both discriminative and generative tasks, with extensions for reducing expert evaluations at test time.
Findings
Significant performance improvements in image and text classification.
Enhanced results in text summarization and multiple-choice QA.
Effective in automatic evaluation of generated text.
Abstract
Training AI models that generalize across tasks and domains has long been among the open problems driving AI research. The emergence of Foundation Models made it easier to obtain expert models for a given task, but the heterogeneity of data that may be encountered at test time often means that any single expert is insufficient. We consider the Fusion of Experts (FoE) problem of fusing outputs of expert models with complementary knowledge of the data distribution and formulate it as an instance of supervised learning. Our method is applicable to both discriminative and generative tasks and leads to significant performance improvements in image and text classification, text summarization, multiple-choice QA, and automatic evaluation of generated text. We also extend our method to the "frugal" setting where it is desired to reduce the number of expert model evaluations at test time. Our…
Peer Reviews
Decision·ICLR 2024 poster
* The proposed method provides an interesting direction that can train multiple models sequentially to train on the residuals of the previous mixture of experts. * Compared to pure residual approaches, because the transformation function is taken on top of each model’s outputs, we can expect this may be more general than the pure residual learning setting.
* The proposed algorithm requires multiple experts to be used together, unlike Sparse MoE, which means that the inference cost is multiple times that of each expert. Therefore, the correct baseline for the proposed algorithm is to compare it to an equal number of parameters with the sum of all individual experts. The authors should make this comparison in their paper. * Similar to the first point, authors provide experimental results on a variety of datasets, however, they did not include many c
- The paper tackles a novel and significant issue: optimally leveraging different models for diverse tasks. While foundational models generally perform well across various tasks, they have differing strengths. Thus, an approach to effectively combine these models represents a significant advancement. - The methodology, FrugalFoE, is both technically sound and innovative. It offers a clear problem formulation and further minimizes the need to query all the expert models without sacrificing accura
- A primary limitation is the assumption that data for training the fuser are readily available. The proposed approach requires a labeled dataset to train the fuser network. In the discriminative case, we feed the input example to different individual models, take model outputs as the inputs to the fuser network, and train the fuser to predict the label of the input example. Similarly, in the generative case, we feed the fuser network with individual model outputs and train the fuser to predict
* The approach is relevant given the availability of pre-trained models nowadays. Methods that smartly fuse models can have impact, since one does not need to re-train, and can incorporate new knowledge to previously trained models. * The experimental section is very complete, and explores several domains of interest. I specially call out the experiments with LLMs. * The mathematical derivations are sound. * The paper is written with clear language, and very few typos.
* I found the explanation of FrugalFoE harder to follow than the rest. See questions below.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Multimodal Machine Learning Applications
