On Horizontal and Vertical Separation in Hierarchical Text   Classification

Mostafa Dehghani; Hosein Azarbonyad; Jaap Kamps; Maarten Marx

arXiv:1609.00514·cs.IR·September 5, 2016

On Horizontal and Vertical Separation in Hierarchical Text Classification

Mostafa Dehghani, Hosein Azarbonyad, Jaap Kamps, Maarten Marx

PDF

TL;DR

This paper investigates the importance of separability in hierarchical text classification, proposing models that improve accuracy by considering entity positions both within and across hierarchy levels.

Contribution

It introduces the Strong Separation Principle and Hierarchical Significant Words Language Models (HSWLM) for better hierarchical data representation and classification.

Findings

01

HSWLM captures essential hierarchical features

02

Improved classification accuracy demonstrated on real data

03

Models are transferable over time

Abstract

Hierarchy is a common and effective way of organizing data and representing their relationships at different levels of abstraction. However, hierarchical data dependencies cause difficulties in the estimation of "separable" models that can distinguish between the entities in the hierarchy. Extracting separable models of hierarchical entities requires us to take their relative position into account and to consider the different types of dependencies in the hierarchy. In this paper, we present an investigation of the effect of separability in text-based entity classification and argue that in hierarchical classification, a separation property should be established between entities not only in the same layer, but also in different layers. Our main findings are the followings. First, we analyse the importance of separability on the data representation in the task of classification and based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.