# Unlabeled Data for Morphological Generation With Character-Based   Sequence-to-Sequence Models

**Authors:** Katharina Kann, Hinrich Sch\"utze

arXiv: 1705.06106 · 2017-07-24

## TL;DR

This paper introduces a semi-supervised approach using unlabeled data to enhance character-based neural models for morphological reinflection, significantly improving performance across multiple languages.

## Contribution

It proposes a novel multi-task training method that leverages unlabeled data to improve morphological reinflection models, reducing reliance on labeled datasets.

## Key findings

- Up to 9.9% accuracy improvement over baselines
- Effective use of unlabeled data in morphological tasks
- Consistent gains across 8 languages

## Abstract

We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another. This is achieved by using unlabeled tokens or random strings as training data for an autoencoding task, adapting a network for morphological reinflection, and performing multi-task training. We thus use limited labeled data more effectively, obtaining up to 9.9% improvement over state-of-the-art baselines for 8 different languages.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.06106/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1705.06106/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1705.06106/full.md

---
Source: https://tomesphere.com/paper/1705.06106