Are LLMs Good Annotators for Discourse-level Event Relation Extraction?

Kangda Wei; Aayush Gautam; Ruihong Huang

arXiv:2407.19568·cs.CL·February 25, 2025

Are LLMs Good Annotators for Discourse-level Event Relation Extraction?

Kangda Wei, Aayush Gautam, Ruihong Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates the effectiveness of large language models like GPT-3.5 and LLaMA-2 in discourse-level event relation extraction, revealing significant limitations compared to supervised models, despite some improvements with fine-tuning.

Contribution

It provides a comprehensive assessment of LLMs for discourse-level event relation extraction, highlighting their weaknesses and the challenges in scaling fine-tuning methods.

Findings

01

LLMs underperform compared to supervised baselines.

02

Fine-tuning improves LLM performance but does not scale well.

03

LLMs tend to fabricate event mentions and struggle with complex relations.

Abstract

Large Language Models (LLMs) have demonstrated proficiency in a wide array of natural language processing tasks. However, its effectiveness over discourse-level event relation extraction (ERE) tasks remains unexplored. In this paper, we assess the effectiveness of LLMs in addressing discourse-level ERE tasks characterized by lengthy documents and intricate relations encompassing coreference, temporal, causal, and subevent types. Evaluation is conducted using an commercial model, GPT-3.5, and an open-source model, LLaMA-2. Our study reveals a notable underperformance of LLMs compared to the baseline established through supervised learning. Although Supervised Fine-Tuning (SFT) can improve LLMs performance, it does not scale well compared to the smaller supervised baseline model. Our quantitative and qualitative analysis shows that LLMs have several weaknesses when applied for extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

WeiKangda/LLM-ERE
pytorchOfficial

Videos

Are LLMs Good Annotators for Discourse-level Event Relation Extraction?· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Dropout · Cosine Annealing · Attention Dropout · Adam · Linear Layer · Byte Pair Encoding · Layer Normalization