A Network Perspective on Stratification of Multi-Label Data
Piotr Szyma\'nski, Tomasz Kajdanowicz

TL;DR
This paper introduces a second-order iterative stratification method for multi-label data that improves the stability and quality of data splits, leading to better classification performance and more reliable evaluation.
Contribution
It extends existing stratification techniques to account for second-order label relationships, enhancing data split quality for multi-label classification.
Findings
Reduces variance in classification quality across folds.
Improves label pair distribution and stability of network characteristics.
Maintains competitive label-oriented classification metrics.
Abstract
In the recent years, we have witnessed the development of multi-label classification methods which utilize the structure of the label space in a divide and conquer approach to improve classification performance and allow large data sets to be classified efficiently. Yet most of the available data sets have been provided in train/test splits that did not account for maintaining a distribution of higher-order relationships between labels among splits or folds. We present a new approach to stratifying multi-label data for classification purposes based on the iterative stratification approach proposed by Sechidis et. al. in an ECML PKDD 2011 paper. Our method extends the iterative approach to take into account second-order relationships between labels. Obtained results are evaluated using statistical properties of obtained strata as presented by Sechidis. We also propose new statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Clustering Algorithms Research
