Language Models as Science Tutors
Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian, Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodr\'iguez Fanlo, Simon, Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T. Wang,, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu

TL;DR
This paper introduces TutorEval and TutorChat, benchmarks and datasets designed to evaluate and improve language models' ability to assist with scientific education and problem-solving involving long texts and multi-disciplinary knowledge.
Contribution
The paper presents new benchmarks and datasets tailored for scientific education tasks, and demonstrates fine-tuned models that outperform existing approaches on these benchmarks.
Findings
Fine-tuned models excel at TutorEval and standard math benchmarks.
Existing dialogue fine-tuning methods perform poorly on scientific long-context tasks.
Open-source datasets and models are released for community use.
Abstract
NLP has recently made exciting progress toward training language models (LMs) with strong scientific problem-solving skills. However, model development has not focused on real-life use-cases of LMs for science, including applications in education that require processing long scientific documents. To address this, we introduce TutorEval and TutorChat. TutorEval is a diverse question-answering benchmark consisting of questions about long chapters from STEM textbooks, written by experts. TutorEval helps measure real-life usability of LMs as scientific assistants, and it is the first benchmark combining long contexts, free-form generation, and multi-disciplinary scientific knowledge. Moreover, we show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. Therefore, we create TutorChat, a dataset of 80,000 long synthetic dialogues about…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Intelligent Tutoring Systems and Adaptive Learning
MethodsBalanced Selection
