Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA
Maharshi Gor, Hal Daum\'e III, Tianyi Zhou, Jordan Boyd-Graber

TL;DR
This paper introduces CAIMIRA, a framework based on item response theory, to quantitatively compare human and AI question-answering abilities, revealing distinct strengths and weaknesses in knowledge and reasoning domains.
Contribution
The paper presents CAIMIRA, a novel IRT-based framework for assessing and comparing human and AI problem-solving skills in QA tasks, with extensive empirical analysis.
Findings
Humans excel in abductive and conceptual reasoning.
AI systems outperform in information retrieval and fact-based reasoning.
Different proficiency patterns highlight complementary strengths between humans and AI.
Abstract
Recent advancements of large language models (LLMs) have led to claims of AI surpassing humans in natural language processing (NLP) tasks such as textual understanding and reasoning. This work investigates these assertions by introducing CAIMIRA, a novel framework rooted in item response theory (IRT) that enables quantitative assessment and comparison of problem-solving abilities of question-answering (QA) agents: humans and AI systems. Through analysis of over 300,000 responses from ~70 AI systems and 155 humans across thousands of quiz questions, CAIMIRA uncovers distinct proficiency patterns in knowledge domains and reasoning skills. Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning, while state-of-the-art LLMs like GPT-4 and LLaMA show superior performance on targeted information retrieval and fact-based reasoning, particularly when information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Advanced Text Analysis Techniques
MethodsDense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Attention Is All You Need · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
