Fennec: Fine-grained Language Model Evaluation and Correction Extended   through Branching and Bridging

Xiaobo Liang; Haoke Zhang; Helan hu; Juntao Li; Jun Xu; Min Zhang

arXiv:2405.12163·cs.CL·May 21, 2024

Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging

Xiaobo Liang, Haoke Zhang, Helan hu, Juntao Li, Jun Xu, Min Zhang

PDF

Open Access 1 Repo

TL;DR

Fennec introduces a fine-grained evaluation and correction framework for language models that dissects tasks into multiple dimensions and combines datasets to improve evaluation accuracy and response quality, approaching GPT-4's performance.

Contribution

The paper presents a novel framework, Fennec, that enhances language model evaluation and correction through branching and bridging operations, improving open-source model performance.

Findings

01

7B model outperforms larger open-source models on benchmarks.

02

Fennec's correction improves response quality by 1-2 points on MT-Bench.

03

Evaluation closely approaches GPT-4 capabilities.

Abstract

The rapid advancement of large language models has given rise to a plethora of applications across a myriad of real-world tasks, mainly centered on aligning with human intent. However, the complexities inherent in human intent necessitate a dependence on labor-intensive and time-consuming human evaluation. To alleviate this constraint, we delve into the paradigm of employing open-source large language models as evaluators, aligning with the prevailing trend of utilizing GPT-4. Particularly, we present a step-by-step evaluation framework: \textbf{Fennec}, capable of \textbf{F}ine-grained \textbf{E}valuatio\textbf{N} and correctio\textbf{N} \textbf{E}xtended through bran\textbf{C}hing and bridging. Specifically, the branching operation dissects the evaluation task into various dimensions and granularities, thereby alleviating the challenges associated with evaluation. Concurrently, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dropreg/fennec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Dense Connections · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Absolute Position Encodings · Byte Pair Encoding · Adam · Dropout