Learning a Formality-Aware Japanese Sentence Representation
Henry Li Xinyuan, Ray Lee, Jerry Chen, Kelly Marchisio

TL;DR
This paper introduces a method to learn Japanese sentence representations that explicitly encode formality, enhancing downstream tasks like translation by preserving social context alongside semantics.
Contribution
It proposes a sequence-to-sequence approach with a formality constraint and a learned formality representation, addressing the lack of annotated data for formality in Japanese.
Findings
Improved preservation of sentence formality in generated outputs
Slight enhancement in semantic preservation of input sentences
Effective adaptation to formality classification without extensive annotated data
Abstract
While the way intermediate representations are generated in encoder-decoder sequence-to-sequence models typically allow them to preserve the semantics of the input sentence, input features such as formality might be left out. On the other hand, downstream tasks such as translation would benefit from working with a sentence representation that preserves formality in addition to semantics, so as to generate sentences with the appropriate level of social formality -- the difference between speaking to a friend versus speaking with a supervisor. We propose a sequence-to-sequence method for learning a formality-aware representation for Japanese sentences, where sentence generation is conditioned on both the original representation of the input sentence, and a side constraint which guides the sentence representation towards preserving formality information. Additionally, we propose augmenting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
