HinFlair: pre-trained contextual string embeddings for pos tagging and text classification in the Hindi language
Harsh Patel

TL;DR
HinFlair is a pre-trained Hindi language model based on contextual string embeddings that improves performance on POS tagging and text classification tasks, surpassing previous models and some transformer-based approaches.
Contribution
The paper introduces HinFlair, a monolingual Hindi language representation model using contextual string embeddings, specifically designed to better capture Hindi linguistic features.
Findings
HinFlair outperforms previous state-of-the-art embeddings in Hindi NLP tasks.
HinFlair combined with FastText surpasses some transformer models for Hindi.
The model shows strong results on multiple Hindi text classification datasets and a dependency treebank.
Abstract
Recent advancements in language models based on recurrent neural networks and transformers architecture have achieved state-of-the-art results on a wide range of natural language processing tasks such as pos tagging, named entity recognition, and text classification. However, most of these language models are pre-trained in high resource languages like English, German, Spanish. Multi-lingual language models include Indian languages like Hindi, Telugu, Bengali in their training corpus, but they often fail to represent the linguistic features of these languages as they are not the primary language of the study. We introduce HinFlair, which is a language representation model (contextual string embeddings) pre-trained on a large monolingual Hindi corpus. Experiments were conducted on 6 text classification datasets and a Hindi dependency treebank to analyze the performance of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
MethodsfastText
