Designing and Evaluating Chain-of-Hints for Scientific Question Answering
Anubhav Jangra, Smaranda Muresan

TL;DR
This paper evaluates 18 open-source large language models for generating chain-of-hints in scientific question answering, comparing static and dynamic hinting strategies to improve educational engagement and understanding.
Contribution
It introduces and compares static and dynamic hinting strategies using open-source LLMs, providing insights into their effectiveness and user preferences in educational contexts.
Findings
Dynamic hints adapt to learner progress, enhancing engagement.
Automatic metrics have limitations in capturing user preferences.
User preferences vary across hinting strategies.
Abstract
LLMs are reshaping education, with students increasingly relying on them for learning. Implemented using general-purpose models, these systems are likely to give away the answers, potentially undermining conceptual understanding and critical thinking. Prior work shows that hints can effectively promote cognitive engagement. Building on this insight, we evaluate 18 open-source LLMs on chain-of-hints generation that scaffold users toward the correct answer. We compare two distinct hinting strategies: static hints, pre-generated for each problem, and dynamic hints, adapted to a learners' progress. We evaluate these systems on five pedagogically grounded automatic metrics for hint quality. Using the best performing LLM as the backbone of a quantitative study with 41 participants, we uncover distinct user preferences across hinting strategies, and identify the limitations of automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
