TALEC: Teach Your LLM to Evaluate in Specific Domain with In-house Criteria by Criteria Division and Zero-shot Plus Few-shot
Kaiqi Zhang, Shuai Yuan, Honghan Zhao

TL;DR
TALEC is a flexible, model-based evaluation method for large language models that uses in-context learning to assess in-house criteria in specific domains, achieving high correlation with human judgments.
Contribution
The paper introduces TALEC, a novel approach combining in-context learning and prompt engineering to evaluate LLMs based on customizable in-house criteria, replacing traditional manual methods.
Findings
TALEC achieves over 80% correlation with human judgments.
It outperforms inter-human agreement in some tasks.
Fine-tuning can be replaced by in-context learning for evaluation.
Abstract
With the rapid development of large language models (LLM), the evaluation of LLM becomes increasingly important. Measuring text generation tasks such as summarization and article creation is very difficult. Especially in specific application domains (e.g., to-business or to-customer service), in-house evaluation criteria have to meet not only general standards (correctness, helpfulness and creativity, etc.) but also specific needs of customers and business security requirements at the same time, making the evaluation more difficult. So far, the evaluation of LLM in business scenarios has mainly relied on manual, which is expensive and time-consuming. In this paper, we propose a model-based evaluation method: TALEC, which allows users to flexibly set their own evaluation criteria, and uses in-context learning (ICL) to teach judge model these in-house criteria. In addition, we try…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational and Technological Research · Imbalanced Data Classification Techniques · Smart Systems and Machine Learning
MethodsSparse Evolutionary Training · Focus
