What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning

Shashidhar Reddy Javaji; Zining Zhu

arXiv:2409.17172·cs.CL·July 9, 2025

What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning

Shashidhar Reddy Javaji, Zining Zhu

PDF

Open Access

TL;DR

This paper introduces a new evaluation framework for large language models to assess their ability to acquire new knowledge through curiosity-driven questioning, revealing that smaller models can be as effective as larger ones.

Contribution

The paper proposes a novel framework for evaluating LLMs' knowledge acquisition via question generation, validated with synthetic datasets and human assessments.

Findings

01

Large models like GPT-4 generate high-quality questions.

02

Smaller models like Phi-2 are equally or more effective.

03

Model size does not solely determine knowledge acquisition potential.

Abstract

Large language models (LLMs) can store a massive amount of knowledge, yet their potential to acquire new knowledge remains unknown. We propose a novel evaluation framework that evaluates this capability. This framework prompts LLMs to generate questions about a statement introducing scientific knowledge, simulating a curious person when facing the statement for the first time. We score the qualities of the generated questions, thereby evaluating the knowledge acquisition potential of the LLM. We apply controlled ablation studies to validate our scoring procedures. Additionally, we created a synthetic dataset consisting of 1101 statements in physics, chemistry, and maths with distinct levels of difficulties, 300 general knowledge statements, and 567 incorrect statements. Human evaluations were conducted to validate our model assessments, achieving an approximate weighted Cohen's kappa of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychological and Educational Research Studies

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding