Question: How do Large Language Models perform on the Question Answering   tasks? Answer:

Kevin Fischer; Darren F\"urst; Sebastian Steindl; Jakob Lindner,; Ulrich Sch\"afer

arXiv:2412.12893·cs.CL·December 18, 2024

Question: How do Large Language Models perform on the Question Answering tasks? Answer:

Kevin Fischer, Darren F\"urst, Sebastian Steindl, Jakob Lindner,, Ulrich Sch\"afer

PDF

Open Access

TL;DR

This study compares the performance of smaller fine-tuned models and large instruction-following LLMs on question-answering tasks, highlighting their strengths and limitations in both fine-tuned and out-of-distribution scenarios.

Contribution

It introduces a single-inference prompting method for unanswerable questions and evaluates model generalization across different QA datasets without fine-tuning.

Findings

01

Smaller fine-tuned models outperform SOTA LLMs on fine-tuned QA tasks.

02

Recent SOTA models close the gap on out-of-distribution datasets and outperform fine-tuned models on most datasets.

03

Single-inference prompting effectively handles unanswerable questions, reducing computational resources.

Abstract

Large Language Models (LLMs) have been showing promising results for various NLP-tasks without the explicit need to be trained for these tasks by using few-shot or zero-shot prompting techniques. A common NLP-task is question-answering (QA). In this study, we propose a comprehensive performance comparison between smaller fine-tuned models and out-of-the-box instruction-following LLMs on the Stanford Question Answering Dataset 2.0 (SQuAD2), specifically when using a single-inference prompting technique. Since the dataset contains unanswerable questions, previous work used a double inference method. We propose a prompting style which aims to elicit the same ability without the need for double inference, saving compute time and resources. Furthermore, we investigate their generalization capabilities by comparing their performance on similar but different QA datasets, without fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques