SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts
Rose E. Wang, Pawan Wirawarn, Noah Goodman, Dorottya Demszky

TL;DR
This paper introduces SIGHT, a large dataset of student feedback from MIT OCW lectures, and demonstrates how large language models can efficiently classify feedback types, revealing valuable insights for improving higher education instruction.
Contribution
The paper presents a new large-scale dataset of student comments and a novel methodology using LLMs for cost-effective feedback classification at scale.
Findings
High correlation between human and model annotations for consistent categories
Cost-effective feedback classification at approximately $0.002 per comment
Insights into feedback patterns and quality based on annotation agreement levels
Abstract
Lectures are a learning experience for both students and teachers. Students learn from teachers about the subject material, while teachers learn from students about how to refine their instruction. However, online student feedback is unstructured and abundant, making it challenging for teachers to learn and improve. We take a step towards tackling this challenge. First, we contribute a dataset for studying this problem: SIGHT is a large dataset of 288 math lecture transcripts and 15,784 comments collected from the Massachusetts Institute of Technology OpenCourseWare (MIT OCW) YouTube channel. Second, we develop a rubric for categorizing feedback types using qualitative analysis. Qualitative analysis methods are powerful in uncovering domain-specific insights, however they are costly to apply to large data sources. To overcome this challenge, we propose a set of best practices for using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Topic Modeling
