MATCH: Metadata-Aware Text Classification in A Large Hierarchy
Yu Zhang, Zhihong Shen, Yuxiao Dong, Kuansan Wang, Jiawei Han

TL;DR
This paper introduces MATCH, an end-to-end framework for multi-label text classification that effectively utilizes both document metadata and large label hierarchies, outperforming existing methods.
Contribution
It formalizes the problem of metadata-aware classification in large hierarchies and proposes a novel framework that integrates metadata embeddings and hierarchy regularization.
Findings
MATCH outperforms state-of-the-art baselines on large-scale datasets.
The framework effectively combines metadata and hierarchy signals.
Experimental results demonstrate significant accuracy improvements.
Abstract
Multi-label text classification refers to the problem of assigning each given document its most relevant labels from the label set. Commonly, the metadata of the given documents and the hierarchy of the labels are available in real-world applications. However, most existing studies focus on only modeling the text information, with a few attempts to utilize either metadata or hierarchy signals, but not both of them. In this paper, we bridge the gap by formalizing the problem of metadata-aware text classification in a large label hierarchy (e.g., with tens of thousands of labels). To address this problem, we present the MATCH solution -- an end-to-end framework that leverages both metadata and hierarchy information. To incorporate metadata, we pre-train the embeddings of text and metadata in the same space and also leverage the fully-connected attentions to capture the interrelations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Spam and Phishing Detection
