Detecting Text Formality: A Study of Text Classification Approaches
Daryna Dementieva, Nikolay Babakov, Alexander Panchenko

TL;DR
This paper systematically compares various machine learning approaches for text formality detection across multiple languages, revealing the strengths of Char BiLSTM in monolingual settings and Transformer models in cross-lingual transfer.
Contribution
It provides the first comprehensive study of formality detection methods using statistical, neural, and Transformer models, and offers the best models for public use.
Findings
Char BiLSTM outperforms Transformer models in monolingual and multilingual tasks.
Transformer models are more stable for cross-lingual transfer.
The study delivers publicly available top-performing models.
Abstract
Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation -- GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments -- monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Topic Modeling
