TL;DR
This paper presents a hierarchical model that combines character and word-level representations to improve language identification in social media messages, effectively handling challenges like brevity and unconventional spelling.
Contribution
The paper introduces a novel hierarchical character-word model that enhances language identification accuracy and can detect code-switching in social media text.
Findings
Outperforms strong baseline models
Effective in identifying language in brief, informal texts
Capable of revealing code-switching instances
Abstract
Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
