Char-RNN and Active Learning for Hashtag Segmentation

Taisiya Glushkova; Ekaterina Artemova

arXiv:1911.03270·cs.CL·October 4, 2023

Char-RNN and Active Learning for Hashtag Segmentation

Taisiya Glushkova, Ekaterina Artemova

PDF

Open Access 1 Datasets

TL;DR

This paper investigates using character RNNs combined with active learning to perform hashtag segmentation across languages without manual annotation, leveraging synthetic data and informative subset selection.

Contribution

It introduces a language-agnostic method that generates synthetic training data and employs active learning to improve hashtag segmentation accuracy.

Findings

01

Effective segmentation across languages with different inflection levels.

02

Synthetic data generation reduces manual annotation needs.

03

Active learning enhances training efficiency.

Abstract

We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ruanchaves/nru_hse
dataset· 71 dl
71 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications