Improving Joint Layer RNN based Keyphrase Extraction by Using   Syntactical Features

Miftahul Mahfuzh; Sidik Soleman; Ayu Purwarianti

arXiv:2009.07119·cs.CL·September 16, 2020

Improving Joint Layer RNN based Keyphrase Extraction by Using Syntactical Features

Miftahul Mahfuzh, Sidik Soleman, Ayu Purwarianti

PDF

TL;DR

This paper enhances joint layer RNN for Indonesian Twitter keyphrase extraction by incorporating syntactical features and data augmentation, resulting in improved accuracy and F1 scores.

Contribution

It introduces a modified JRNN model that uses syntactical features and data augmentation for better keyphrase extraction from social media texts.

Findings

01

Achieved 0.9597 accuracy and 0.7691 F1 score.

02

Outperformed baseline keyphrase extraction methods.

03

Effective use of syntactical features and data augmentation.

Abstract

Keyphrase extraction as a task to identify important words or phrases from a text, is a crucial process to identify main topics when analyzing texts from a social media platform. In our study, we focus on text written in Indonesia language taken from Twitter. Different from the original joint layer recurrent neural network (JRNN) with output of one sequence of keywords and using only word embedding, here we propose to modify the input layer of JRNN to extract more than one sequence of keywords by additional information of syntactical features, namely part of speech, named entity types, and dependency structures. Since JRNN in general requires a large amount of data as the training examples and creating those examples is expensive, we used a data augmentation method to increase the number of training examples. Our experiment had shown that our method outperformed the baseline methods.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.