Are Frontier Large Language Models Suitable for Q&A in Science Centres?
Jacob Watson, Fabr\'icio G\'oes, Marco Volpe, Talles Medeiros

TL;DR
This study evaluates the effectiveness of leading large language models in answering science questions for children, highlighting their strengths in engagement and clarity but also the trade-offs with factual accuracy.
Contribution
It provides a comparative analysis of GPT-4, Claude 3.5 Sonnet, and Google Gemini 1.5 in a real-world educational context, emphasizing the importance of prompt design.
Findings
Claude outperformed GPT and Gemini in clarity and engagement.
Higher creativity correlated with lower factual accuracy.
Models show potential but require careful prompt engineering.
Abstract
This paper investigates the suitability of frontier Large Language Models (LLMs) for Q&A interactions in science centres, with the aim of boosting visitor engagement while maintaining factual accuracy. Using a dataset of questions collected from the National Space Centre in Leicester (UK), we evaluated responses generated by three leading models: OpenAI's GPT-4, Claude 3.5 Sonnet, and Google Gemini 1.5. Each model was prompted for both standard and creative responses tailored to an 8-year-old audience, and these responses were assessed by space science experts based on accuracy, engagement, clarity, novelty, and deviation from expected answers. The results revealed a trade-off between creativity and accuracy, with Claude outperforming GPT and Gemini in both maintaining clarity and engaging young audiences, even when asked to generate more creative responses. Nonetheless, experts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Absolute Position Encodings · Residual Connection · Transformer · Adam · Dense Connections · Cosine Annealing
