From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

Tu Anh Dinh; Philipp Nicolas Schumacher; Jan Niehues

arXiv:2510.22272·cs.CL·March 19, 2026

From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

Tu Anh Dinh, Philipp Nicolas Schumacher, Jan Niehues

PDF

TL;DR

This paper explores methods to improve large language models' ability to answer university-level computer science questions by integrating course materials like slides and transcripts, using retrieval-augmented generation and multi-modal approaches.

Contribution

It demonstrates that retrieval-augmented generation, especially with multi-modal slide retrieval, outperforms continual pre-training for incorporating course-specific knowledge into LLMs.

Findings

01

RAG outperforms CPT in small, course-specific datasets.

02

Multi-modal slide retrieval with images improves accuracy.

03

Incorporating course materials enhances LLM support for education.

Abstract

Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.