No Single Best Model for Diversity: Learning a Router for Sample Diversity
Yuhan Liu, Fangyuan Xu, Vishakh Padmakumar, Daphne Ippolito, Eunsol Choi

TL;DR
This paper investigates methods to generate diverse responses from language models, introduces a new diversity metric, and proposes a router to select the best model per query, improving overall answer diversity.
Contribution
It introduces a diversity coverage metric, evaluates 18 LLMs, and develops a router that outperforms single models in generating diverse answers.
Findings
No single model dominates in diversity coverage across prompts.
A model-specific router significantly improves diversity coverage.
The router generalizes well to out-of-domain datasets.
Abstract
When posed with prompts that permit a large number of valid answers, comprehensively generating them is the first step towards satisfying a wide range of users. In this paper, we study methods to elicit a comprehensive set of valid responses. To evaluate this, we introduce \textbf{diversity coverage}, a metric that measures the total quality scores assigned to each \textbf{unique} answer in the predicted answer set relative to the best possible answer set with the same number of answers. Using this metric, we evaluate 18 LLMs, finding no single model dominates at generating diverse responses to a wide range of open-ended prompts. Yet, per each prompt, there exists a model that outperforms all other models significantly at generating a diverse answer set. Motivated by this finding, we introduce a router that predicts the best model for each query. On NB-Wildchat, our trained router…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
