Evaluation of Instruction-Following Ability for Large Language Models on   Story-Ending Generation

Rem Hida; Junki Ohmura; Toshiyuki Sekiya

arXiv:2406.16356·cs.CL·June 25, 2024

Evaluation of Instruction-Following Ability for Large Language Models on Story-Ending Generation

Rem Hida, Junki Ohmura, Toshiyuki Sekiya

PDF

Open Access

TL;DR

This paper introduces an automatic evaluation method for assessing how well large language models follow instructions in story-ending generation, showing that open-source models perform nearly as well as GPT-3.5.

Contribution

The paper proposes a novel MRC-based automatic evaluation pipeline for instruction-following ability in story generation tasks, validated against human judgments.

Findings

01

The proposed metric aligns well with human evaluation.

02

Open-source LLMs nearly match GPT-3.5 in instruction-following performance.

03

The evaluation method is effective for diverse, context-specific instructions.

Abstract

Instruction-tuned Large Language Models (LLMs) have achieved remarkable performance across various benchmark tasks. While providing instructions to LLMs for guiding their generations is user-friendly, assessing their instruction-following capabilities is still unclarified due to a lack of evaluation metrics. In this paper, we focus on evaluating the instruction-following ability of LLMs in the context of story-ending generation, which requires diverse and context-specific instructions. We propose an automatic evaluation pipeline that utilizes a machine reading comprehension (MRC) model to determine whether the generated story-ending reflects instruction. Our findings demonstrate that our proposed metric aligns with human evaluation. Furthermore, our experiments confirm that recent open-source LLMs can achieve instruction-following performance close to GPT-3.5, as assessed through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Educational Games and Gamification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Residual Connection · Multi-Head Attention · Weight Decay · Softmax · Layer Normalization