Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li, Rui Li, Qi Liu

TL;DR
This paper introduces a novel deep interaction-based framework for evaluating large language models in dynamic, real-world scenarios, overcoming limitations of static datasets and costly human assessments.
Contribution
It proposes a general evaluation framework that assesses LLMs through their interactions with other models across various real-world tasks, enabling scalable and dynamic evaluation.
Findings
Effective in evaluating LLMs in real-world domains
Applicable to multiple tasks like translation and code generation
Demonstrated through extensive experiments
Abstract
Large Language Models (LLMs) have made progress in various real-world tasks, which stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are mainly supervised signal-based which depends on static datasets and cannot evaluate the ability of LLMs in dynamic real-world scenarios where deep interaction widely exists. Other LLM evaluation methods are human-based which are costly and time-consuming and are incapable of large-scale evaluation of LLMs. To address the issues above, we propose a novel Deep Interaction-based LLM-evaluation framework. In our proposed framework, LLMs' performances in real-world domains can be evaluated from their deep interaction with other LLMs in elaborately designed evaluation tasks. Furthermore, our proposed framework is a general evaluation method that can be applied to a host of real-world tasks such as machine translation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
