How is a data-driven approach better than random choice in label space   division for multi-label classification?

Piotr Szyma\'nski; Tomasz Kajdanowicz; Kristian Kersting

arXiv:1606.02346·cs.LG·August 24, 2016

How is a data-driven approach better than random choice in label space division for multi-label classification?

Piotr Szyma\'nski, Tomasz Kajdanowicz, Kristian Kersting

PDF

TL;DR

This paper demonstrates that using community detection algorithms on label co-occurrence graphs significantly improves multi-label classification performance over random label partitioning, across multiple datasets and metrics.

Contribution

It introduces a novel approach of applying social network community detection methods to partition label space in multi-label classification, outperforming random partitioning methods.

Findings

01

Community detection methods outperform random partitioning in most cases.

02

Fastgreedy and walktrap on weighted graphs improve F1 scores by 85-92%.

03

Infomap on unweighted graphs improves Subset Accuracy and Jaccard similarity by about 90%.

Abstract

We propose using five data-driven community detection approaches from social networks to partition the label space for the task of multi-label classification as an alternative to random partitioning into equal subsets as performed by RAkELd: modularity-maximizing fastgreedy and leading eigenvector, infomap, walktrap and label propagation algorithms. We construct a label co-occurence graph (both weighted an unweighted versions) based on training data and perform community detection to partition the label set. We include Binary Relevance and Label Powerset classification methods for comparison. We use gini-index based Decision Trees as the base classifier. We compare educated approaches to label space divisions against random baselines on 12 benchmark data sets over five evaluation measures. We show that in almost all cases seven educated guess approaches are more likely to outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.