Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors   for a Robotics Course

Sebastian Kahl; Felix L\"offler; Martin Maciol; Fabian Ridder; Marius; Schmitz; Jennifer Spanagel; Jens Wienkamp; Christopher Burgahn; Malte; Schilling

arXiv:2408.04645·cs.CL·August 12, 2024

Evaluating the Impact of Advanced LLM Techniques on AI-Lecture Tutors for a Robotics Course

Sebastian Kahl, Felix L\"offler, Martin Maciol, Fabian Ridder, Marius, Schmitz, Jennifer Spanagel, Jens Wienkamp, Christopher Burgahn, Malte, Schilling

PDF

Open Access

TL;DR

This paper evaluates how advanced techniques like prompt engineering, RAG, and fine-tuning improve LLM-based AI tutors for university courses, highlighting benefits and challenges in educational applications.

Contribution

It demonstrates that RAG combined with prompt engineering significantly improves LLM responses and discusses the limitations of current evaluation metrics in educational contexts.

Findings

01

RAG with prompt engineering enhances factual accuracy

02

Fine-tuning produces strong but potentially overfitted models

03

Similarity metrics correlate with performance but favor shorter responses

Abstract

This study evaluates the performance of Large Language Models (LLMs) as an Artificial Intelligence-based tutor for a university course. In particular, different advanced techniques are utilized, such as prompt engineering, Retrieval-Augmented-Generation (RAG), and fine-tuning. We assessed the different models and applied techniques using common similarity metrics like BLEU-4, ROUGE, and BERTScore, complemented by a small human evaluation of helpfulness and trustworthiness. Our findings indicate that RAG combined with prompt engineering significantly enhances model responses and produces better factual answers. In the context of education, RAG appears as an ideal technique as it is based on enriching the input of the model with additional information and material which usually is already present for a university course. Fine-tuning, on the other hand, can produce quite small, still…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Robotic Process Automation Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Attention Dropout · WordPiece · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Attention Is All You Need · Weight Decay · Adam