TL;DR
This paper empirically evaluates character-based neural models for Indonesian conversational NER, demonstrating their superiority over word-only models, especially in handling OOV words and high OOV rates.
Contribution
It provides the first empirical assessment of character-based models for conversational NER in Indonesian, highlighting their effectiveness in OOV scenarios.
Findings
Character models outperform word-only models by up to 4 F1 points.
Character models improve OOV case performance by up to 15 F1 points.
Character models are robust against high OOV rates.
Abstract
Despite the long history of named-entity recognition (NER) task in the natural language processing community, previous work rarely studied the task on conversational texts. Such texts are challenging because they contain a lot of word variations which increase the number of out-of-vocabulary (OOV) words. The high number of OOV words poses a difficulty for word-based neural models. Meanwhile, there is plenty of evidence to the effectiveness of character-based neural models in mitigating this OOV problem. We report an empirical evaluation of neural sequence labeling models with character embedding to tackle NER task in Indonesian conversational texts. Our experiments show that (1) character models outperform word embedding-only models by up to 4 points, (2) character models perform better in OOV cases with an improvement of as high as 15 points, and (3) character models are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
