TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

Kaiqi Zhang; Shuai Yuan; Honghan Zhao

arXiv:2407.10999·cs.CL·September 25, 2025

TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot

Kaiqi Zhang, Shuai Yuan, Honghan Zhao

PDF

Open Access 1 Repo

TL;DR

TALEC is a flexible, model-based evaluation method for large language models that uses in-context learning to assess in-house criteria in specific domains, achieving high correlation with human judgments.

Contribution

The paper introduces TALEC, a novel approach combining in-context learning and prompt engineering to evaluate LLMs based on customizable in-house criteria, replacing traditional manual methods.

Findings

01

TALEC achieves over 80% correlation with human judgments.

02

It outperforms inter-human agreement in some tasks.

03

Fine-tuning can be replaced by in-context learning for evaluation.

Abstract

With the rapid development of large language models (LLM), the evaluation of LLM becomes increasingly important. Measuring text generation tasks such as summarization and article creation is very difficult. Especially in specific application domains (e.g., to-business or to-customer service), in-house evaluation criteria have to meet not only general standards (correctness, helpfulness and creativity, etc.) but also specific needs of customers and business security requirements at the same time, making the evaluation more difficult. So far, the evaluation of LLM in business scenarios has mainly relied on manual, which is expensive and time-consuming. In this paper, we propose a model-based evaluation method: TALEC, which allows users to flexibly set their own evaluation criteria, and uses in-context learning (ICL) to teach judge model these in-house criteria. In addition, we try…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zlkqz/auto_eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational and Technological Research · Imbalanced Data Classification Techniques · Smart Systems and Machine Learning

MethodsSparse Evolutionary Training · Focus