Evaluation of Instruction-Following Ability for Large Language Models on Story-Ending Generation
Rem Hida, Junki Ohmura, Toshiyuki Sekiya

TL;DR
This paper introduces an automatic evaluation method for assessing how well large language models follow instructions in story-ending generation, showing that open-source models perform nearly as well as GPT-3.5.
Contribution
The paper proposes a novel MRC-based automatic evaluation pipeline for instruction-following ability in story generation tasks, validated against human judgments.
Findings
The proposed metric aligns well with human evaluation.
Open-source LLMs nearly match GPT-3.5 in instruction-following performance.
The evaluation method is effective for diverse, context-specific instructions.
Abstract
Instruction-tuned Large Language Models (LLMs) have achieved remarkable performance across various benchmark tasks. While providing instructions to LLMs for guiding their generations is user-friendly, assessing their instruction-following capabilities is still unclarified due to a lack of evaluation metrics. In this paper, we focus on evaluating the instruction-following ability of LLMs in the context of story-ending generation, which requires diverse and context-specific instructions. We propose an automatic evaluation pipeline that utilizes a machine reading comprehension (MRC) model to determine whether the generated story-ending reflects instruction. Our findings demonstrate that our proposed metric aligns with human evaluation. Furthermore, our experiments confirm that recent open-source LLMs can achieve instruction-following performance close to GPT-3.5, as assessed through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Educational Games and Gamification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Residual Connection · Multi-Head Attention · Weight Decay · Softmax · Layer Normalization
