5W1H Extraction With Large Language Models

Yang Cao; Yangsong Lan; Feiyan Zhai; Piji Li

arXiv:2405.16150·cs.CL·May 28, 2024

5W1H Extraction With Large Language Models

Yang Cao, Yangsong Lan, Feiyan Zhai, Piji Li

PDF

Open Access

TL;DR

This paper introduces a high-quality 5W1H dataset for news articles and compares various prompting and fine-tuning strategies, demonstrating improved extraction performance over ChatGPT and exploring domain adaptation capabilities.

Contribution

The paper provides a new annotated dataset for 5W1H extraction and evaluates multiple strategies, including fine-tuning, for improved performance over existing LLMs.

Findings

01

Fine-tuned models outperform ChatGPT on 5W1H extraction.

02

High-quality annotated datasets enhance extraction accuracy.

03

Domain adaptation shows promising transferability across news corpora.

Abstract

The extraction of essential news elements through the 5W1H framework (\textit{What}, \textit{When}, \textit{Where}, \textit{Why}, \textit{Who}, and \textit{How}) is critical for event extraction and text summarization. The advent of Large language models (LLMs) such as ChatGPT presents an opportunity to address language-related tasks through simple prompts without fine-tuning models with much time. While ChatGPT has encountered challenges in processing longer news texts and analyzing specific attributes in context, especially answering questions about \textit{What}, \textit{Why}, and \textit{How}. The effectiveness of extraction tasks is notably dependent on high-quality human-annotated datasets. However, the absence of such datasets for the 5W1H extraction increases the difficulty of fine-tuning strategies based on open-source LLMs. To address these limitations, first, we annotate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis