UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang,, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li

TL;DR
UniCBE is a unified evaluation framework for large language models that improves accuracy, efficiency, and scalability by optimizing multiple core objectives through uniformity-driven sampling strategies.
Contribution
It introduces a novel unified framework that simultaneously optimizes key factors in comparing-based evaluation, enhancing accuracy and scalability over existing methods.
Findings
Achieves over 17% savings in evaluation budgets on AlpacaEval.
Attains Pearson correlation exceeding 0.995 with ground truth.
Reduces evaluation costs by over 50% in continuous model deployment scenarios.
Abstract
Human preference plays a significant role in measuring large language models and guiding them to align with human values. Unfortunately, current comparing-based evaluation (CBE) methods typically focus on a single optimization objective, failing to effectively utilize scarce yet valuable preference signals. To address this, we delve into key factors that can enhance the accuracy, convergence, and scalability of CBE: suppressing sampling bias, balancing descending process of uncertainty, and mitigating updating uncertainty. Following the derived guidelines, we propose UniCBE, a unified uniformity-driven CBE framework which simultaneously optimize these core objectives by constructing and integrating three decoupled sampling probability matrices, each designed to ensure uniformity in specific aspects. We further ablate the optimal tuple sampling and preference aggregation strategies to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms
MethodsALIGN · Focus
