Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs
Miguel Ballesteros, Chris Dyer, Noah A. Smith

TL;DR
This paper enhances a transition-based dependency parser by replacing word lookup representations with character-based encodings using LSTMs, improving parsing for morphologically rich languages.
Contribution
It introduces a novel character-level encoding approach within a high-performance LSTM-based parser, enabling better handling of morphological variations.
Findings
Character-based representations improve parsing accuracy for morphologically rich languages
The method enables statistical sharing across similar word forms
Experiments demonstrate significant benefits over traditional word lookup methods
Abstract
We present extensions to a continuous-state dependency parsing method that makes it applicable to morphologically rich languages. Starting with a high-performance transition-based parser that uses long short-term memory (LSTM) recurrent neural networks to learn representations of the parser state, we replace lookup-based word representations with representations constructed from the orthographic representations of the words, also using LSTMs. This allows statistical sharing across word forms that are similar on the surface. Experiments for morphologically rich languages show that the parsing model benefits from incorporating the character-based encodings of words.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
