UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with   Unified Multi-Objective Optimization

Peiwen Yuan; Shaoxiong Feng; Yiwei Li; Xinglin Wang; Yueqi Zhang,; Jiayi Shi; Chuyi Tan; Boyuan Pan; Yao Hu; Kan Li

arXiv:2502.11454·cs.CL·February 18, 2025

UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization

Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang,, Jiayi Shi, Chuyi Tan, Boyuan Pan, Yao Hu, Kan Li

PDF

Open Access

TL;DR

UniCBE is a unified evaluation framework for large language models that improves accuracy, efficiency, and scalability by optimizing multiple core objectives through uniformity-driven sampling strategies.

Contribution

It introduces a novel unified framework that simultaneously optimizes key factors in comparing-based evaluation, enhancing accuracy and scalability over existing methods.

Findings

01

Achieves over 17% savings in evaluation budgets on AlpacaEval.

02

Attains Pearson correlation exceeding 0.995 with ground truth.

03

Reduces evaluation costs by over 50% in continuous model deployment scenarios.

Abstract

Human preference plays a significant role in measuring large language models and guiding them to align with human values. Unfortunately, current comparing-based evaluation (CBE) methods typically focus on a single optimization objective, failing to effectively utilize scarce yet valuable preference signals. To address this, we delve into key factors that can enhance the accuracy, convergence, and scalability of CBE: suppressing sampling bias, balancing descending process of uncertainty, and mitigating updating uncertainty. Following the derived guidelines, we propose UniCBE, a unified uniformity-driven CBE framework which simultaneously optimize these core objectives by constructing and integrating three decoupled sampling probability matrices, each designed to ensure uniformity in specific aspects. We further ablate the optimal tuple sampling and preference aggregation strategies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Multi-Objective Optimization Algorithms

MethodsALIGN · Focus