IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
Ruosen Li, Ruochen Li, Barry Wang, Xinya Du

TL;DR
IQA-EVAL introduces an automatic, scalable framework using LLMs to simulate human interactions and evaluate large language models in interactive question answering, closely aligning with human judgments.
Contribution
The paper presents a novel LLM-based evaluation agent that simulates human behavior and interactions for automatic evaluation of LLMs in interactive QA tasks, improving scalability and correlation with human assessments.
Findings
High correlation between IQA-EVAL and human evaluations.
Persona assignment enhances evaluation accuracy.
Efficient evaluation of multiple LLMs on complex questions.
Abstract
To evaluate Large Language Models (LLMs) for question answering (QA), traditional methods typically focus on assessing single-turn responses to given questions. However, this approach doesn't capture the dynamic nature of human-AI interactions, where humans actively seek information through conversation. Recent works in human-computer interaction (HCI) have employed human evaluators to conduct interactions and evaluations, but they are often prohibitively expensive and time-consuming to scale. We introduce an automatic evaluation framework IQA-EVAL to achieve Interactive Question Answering Evaluations, more specifically, we introduce a LLM-based Evaluation Agent (LEA) that can: (1) simulate human behaviors to generate interactions with IQA models; (2) automatically evaluate the generated interactions. Moreover, we propose assigning personas to LEAs to better simulate groups of real…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning
MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings
