Do LLMs and Humans Find the Same Questions Difficult? A Case Study on Japanese Quiz Answering
Naoya Sugiura, Kosuke Yamada, Yasuhiro Ogawa, Katsuhiko Toyama, Ryohei Sasano

TL;DR
This study compares the difficulty of Japanese quiz questions for LLMs and humans, revealing that LLMs struggle more with questions outside Wikipedia coverage and those requiring numerical answers.
Contribution
It provides a comparative analysis of question difficulty between LLMs and humans in Japanese quizzes, highlighting specific areas where LLMs underperform.
Findings
LLMs struggle more with questions not covered by Wikipedia.
LLMs have difficulty with questions requiring numerical answers.
Humans perform better on questions outside Wikipedia coverage.
Abstract
LLMs have achieved performance that surpasses humans in many NLP tasks. However, it remains unclear whether problems that are difficult for humans are also difficult for LLMs. This study investigates how the difficulty of quizzes in a buzzer setting differs between LLMs and humans. Specifically, we first collect Japanese quiz data including questions, answers, and correct response rate of humans, then prompted LLMs to answer the quizzes under several settings, and compare their correct answer rate to that of humans from two analytical perspectives. The experimental results showed that, compared to humans, LLMs struggle more with quizzes whose correct answers are not covered by Wikipedia entries, and also have difficulty with questions that require numerical answers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Text Readability and Simplification
