TechTexC: Classification of Technical Texts using Convolution and Bidirectional Long Short Term Memory Network
Omar Sharif, Eftekhar Hossain, Mohammed Moshiul Hoque

TL;DR
This paper presents TechTexC, a classification system for technical texts using CNN, BiLSTM, and combined models, achieving high accuracy in classifying technical domains and sub-domains in multiple languages.
Contribution
The paper introduces a combined CNN and BiLSTM model for technical text classification, outperforming individual models in shared task evaluations.
Findings
Combined CNN and BiLSTM model achieved highest F1 scores on development dataset.
The approach outperformed other techniques in shared task sub-tasks.
Achieved up to 70% accuracy on test datasets for sub-domain classification.
Abstract
This paper illustrates the details description of technical text classification system and its results that developed as a part of participation in the shared task TechDofication 2020. The shared task consists of two sub-tasks: (i) first task identify the coarse-grained technical domain of given text in a specified language and (ii) the second task classify a text of computer science domain into fine-grained sub-domains. A classification system (called 'TechTexC') is developed to perform the classification task using three techniques: convolution neural network (CNN), bidirectional long short term memory (BiLSTM) network, and combined CNN with BiLSTM. Results show that CNN with BiLSTM model outperforms the other techniques concerning task-1 of sub-tasks (a, b, c and g) and task-2a. This combined model obtained f1 scores of 82.63 (sub-task a), 81.95 (sub-task b), 82.39 (sub-task c),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Natural Language Processing Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Convolution · Bidirectional LSTM
