ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger   Detection

Umitcan Sahin; Izzet Emre Kucukkaya; Cagri Toraman

arXiv:2307.14912·cs.CL·July 28, 2023

ARC-NLP at PAN 2023: Hierarchical Long Text Classification for Trigger Detection

Umitcan Sahin, Izzet Emre Kucukkaya, Cagri Toraman

PDF

Open Access

TL;DR

This paper presents a hierarchical long text classification method for detecting triggering content in fanfiction, combining Transformer fine-tuning and LSTM models to improve detection accuracy.

Contribution

The authors introduce a novel hierarchical model that integrates Transformer-based language models with LSTM for multi-label trigger detection in long texts.

Findings

01

Achieved F1-macro score of 0.372 on validation set

02

Achieved F1-micro score of 0.736 on validation set

03

Outperformed baseline results at PAN CLEF 2023

Abstract

Fanfiction, a popular form of creative writing set within established fictional universes, has gained a substantial online following. However, ensuring the well-being and safety of participants has become a critical concern in this community. The detection of triggering content, material that may cause emotional distress or trauma to readers, poses a significant challenge. In this paper, we describe our approach for the Trigger Detection shared task at PAN CLEF 2023, where we want to detect multiple triggering content in a given Fanfiction document. For this, we build a hierarchical model that uses recurrence over Transformer-based language models. In our approach, we first split long documents into smaller sized segments and use them to fine-tune a Transformer model. Then, we extract feature embeddings from the fine-tuned Transformer model, which are used as input in the training of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComics and Graphic Narratives · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Tanh Activation · Byte Pair Encoding · Linear Layer · Softmax · Layer Normalization · Dense Connections · Dropout · Sigmoid Activation