Comparison of Large Language Models for Generating Contextually Relevant   Questions

Ivo Lodovico Molina; Valdemar \v{S}v\'abensk\'y; Tsubasa Minematsu; Li; Chen; Fumiya Okubo; Atsushi Shimada

arXiv:2407.20578·cs.CL·September 17, 2024·1 cites

Comparison of Large Language Models for Generating Contextually Relevant Questions

Ivo Lodovico Molina, Valdemar \v{S}v\'abensk\'y, Tsubasa Minematsu, Li, Chen, Fumiya Okubo, Atsushi Shimada

PDF

Open Access 1 Repo

TL;DR

This paper compares three large language models for automatic question generation from educational slide text, evaluating their effectiveness in producing relevant, clear, and well-aligned questions without fine-tuning.

Contribution

It provides an analysis of LLMs' capabilities for automatic question generation in educational contexts, highlighting their relative performance and strengths.

Findings

01

GPT-3.5 and Llama 2-Chat outperform Flan T5 XXL in key metrics.

02

GPT-3.5 excels at tailoring questions to answers.

03

Questions generated are suitable for educational use.

Abstract

This study explores the effectiveness of Large Language Models (LLMs) for Automatic Question Generation in educational settings. Three LLMs are compared in their ability to create questions from university slide text without fine-tuning. Questions were obtained in a two-step pipeline: first, answer phrases were extracted from slides using Llama 2-Chat 13B; then, the three models generated questions for each answer. To analyze whether the questions would be suitable in educational applications for students, a survey was conducted with 46 students who evaluated a total of 246 questions across five metrics: clarity, relevance, difficulty, slide relation, and question-answer alignment. Results indicate that GPT-3.5 and Llama 2-Chat 13B outperform Flan T5 XXL by a small margin, particularly in terms of clarity and question-answer alignment. GPT-3.5 especially excels at tailoring questions to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

limu-research/2024-ectel-qg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Expert finding and Q&A systems

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Inverse Square Root Schedule · Dropout · Cosine Annealing · Adafactor · Attention Dropout · SentencePiece · Adam · Linear Layer