# Character Eyes: Seeing Language through Character-Level Taggers

**Authors:** Yuval Pinter, Marc Marone, Jacob Eisenstein

arXiv: 1903.05041 · 2019-03-13

## TL;DR

This paper investigates how character-level LSTM taggers process different languages by analyzing hidden units, revealing links between morphological features and model behavior, and exploring the impact of unit balance on performance.

## Contribution

It introduces a method to analyze hidden units in character-level taggers across languages and examines how morphological properties influence their behavior and effectiveness.

## Key findings

- Hidden units correlate with morphological features.
- Language-specific challenges affect tagger performance.
- Adjusting forward/backward unit balance impacts results.

## Abstract

Character-level models have been used extensively in recent years in NLP tasks as both supplements and replacements for closed-vocabulary token-level word representations. In one popular architecture, character-level LSTMs are used to feed token representations into a sequence tagger predicting token-level annotations such as part-of-speech (POS) tags. In this work, we examine the behavior of POS taggers across languages from the perspective of individual hidden units within the character LSTM. We aggregate the behavior of these units into language-level metrics which quantify the challenges that taggers face on languages with different morphological properties, and identify links between synthesis and affixation preference and emergent behavior of the hidden tagger layer. In a comparative experiment, we show how modifying the balance between forward and backward hidden units affects model arrangement and performance in these types of languages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.05041/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.05041/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/1903.05041/full.md

---
Source: https://tomesphere.com/paper/1903.05041