Char-RNN and Active Learning for Hashtag Segmentation
Taisiya Glushkova, Ekaterina Artemova

TL;DR
This paper investigates using character RNNs combined with active learning to perform hashtag segmentation across languages without manual annotation, leveraging synthetic data and informative subset selection.
Contribution
It introduces a language-agnostic method that generates synthetic training data and employs active learning to improve hashtag segmentation accuracy.
Findings
Effective segmentation across languages with different inflection levels.
Synthetic data generation reduces manual annotation needs.
Active learning enhances training efficiency.
Abstract
We explore the abilities of character recurrent neural network (char-RNN) for hashtag segmentation. Our approach to the task is the following: we generate synthetic training dataset according to frequent n-grams that satisfy predefined morpho-syntactic patterns to avoid any manual annotation. The active learning strategy limits the training dataset and selects informative training subset. The approach does not require any language-specific settings and is compared for two languages, which differ in inflection degree.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
