Hawk: An Industrial-strength Multi-label Document Classifier
Arshad Javeed

TL;DR
Hawk introduces a neural network architecture for multi-label document classification that effectively handles variable-length texts, online updates, visualization, imbalanced data, and scalability, outperforming existing methods.
Contribution
The paper presents a novel hydranet-like neural architecture that addresses key industrial challenges in multi-label document classification, including modularity, visualization, and handling skewed datasets.
Findings
Outperforms existing methods on benchmark datasets
Attention mechanism improves model performance
Weighted loss functions enhance task-specific accuracy
Abstract
There are a plethora of methods and algorithms that solve the classical multi-label document classification. However, when it comes to deployment and usage in an industry setting, most, if not all the contemporary approaches fail to address some of the vital aspects or requirements of an ideal solution: i. ability to operate on variable-length texts and rambling documents. ii. catastrophic forgetting problem. iii. modularity when it comes to online learning and updating the model. iv. ability to spotlight relevant text while producing the prediction, i.e. visualizing the predictions. v. ability to operate on imbalanced or skewed datasets. vi. scalability. The paper describes the significance of these problems in detail and proposes a unique neural network architecture that addresses the above problems. The proposed architecture views documents as a sequence of sentences and leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques · Web Data Mining and Analysis
Methodsfail
