Hawk: An Industrial-strength Multi-label Document Classifier

Arshad Javeed

arXiv:2301.06057·cs.CL·January 18, 2023

Hawk: An Industrial-strength Multi-label Document Classifier

Arshad Javeed

PDF

Open Access

TL;DR

Hawk introduces a neural network architecture for multi-label document classification that effectively handles variable-length texts, online updates, visualization, imbalanced data, and scalability, outperforming existing methods.

Contribution

The paper presents a novel hydranet-like neural architecture that addresses key industrial challenges in multi-label document classification, including modularity, visualization, and handling skewed datasets.

Findings

01

Outperforms existing methods on benchmark datasets

02

Attention mechanism improves model performance

03

Weighted loss functions enhance task-specific accuracy

Abstract

There are a plethora of methods and algorithms that solve the classical multi-label document classification. However, when it comes to deployment and usage in an industry setting, most, if not all the contemporary approaches fail to address some of the vital aspects or requirements of an ideal solution: i. ability to operate on variable-length texts and rambling documents. ii. catastrophic forgetting problem. iii. modularity when it comes to online learning and updating the model. iv. ability to spotlight relevant text while producing the prediction, i.e. visualizing the predictions. v. ability to operate on imbalanced or skewed datasets. vi. scalability. The paper describes the significance of these problems in detail and proposes a unique neural network architecture that addresses the above problems. The proposed architecture views documents as a sequence of sentences and leverages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Web Data Mining and Analysis

Methodsfail