Multichannel LSTM-CNN for Telugu Technical Domain Identification

Sunil Gundapu; Radhika Mamidi

arXiv:2102.12179·cs.CL·February 25, 2021·1 cites

Multichannel LSTM-CNN for Telugu Technical Domain Identification

Sunil Gundapu, Radhika Mamidi

PDF

Open Access

TL;DR

This paper introduces a Multichannel LSTM-CNN model for identifying technical domains in Telugu text, achieving notable F1 scores in a shared task, advancing NLP applications for low-resource languages.

Contribution

The paper presents a novel Multichannel LSTM-CNN architecture specifically designed for Telugu technical domain identification, demonstrating its effectiveness in a competitive setting.

Findings

01

F1 score of 69.9% on test data

02

F1 score of 90.01% on validation data

03

Effective for Telugu NLP domain identification

Abstract

With the instantaneous growth of text information, retrieving domain-oriented information from the text data has a broad range of applications in Information Retrieval and Natural language Processing. Thematic keywords give a compressed representation of the text. Usually, Domain Identification plays a significant role in Machine Translation, Text Summarization, Question Answering, Information Extraction, and Sentiment Analysis. In this paper, we proposed the Multichannel LSTM-CNN methodology for Technical Domain Identification for Telugu. This architecture was used and evaluated in the context of the ICON shared task TechDOfication 2020 (task h), and our system got 69.9% of the F1 score on the test dataset and 90.01% on the validation set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies