Extreme Multi-label Completion for Semantic Document Labelling with Taxonomy-Aware Parallel Learning
Julien Audiffren, Christophe Broillet, Ljiljana Dolamic, Philippe, Cudr\'e-Mauroux

TL;DR
This paper introduces TAMLEC, a novel taxonomy-aware multi-task learning approach for extreme multi-label document completion, leveraging hierarchical label structures and parallel feature sharing to improve accuracy, especially in few-shot scenarios.
Contribution
TAMLEC is the first method to combine taxonomy-aware multi-task learning with parallel feature sharing for XMLCo, enhancing performance on large label sets and few-shot tasks.
Findings
TAMLEC outperforms state-of-the-art XMLCo methods on real datasets.
TAMLEC is highly effective in few-shot label prediction scenarios.
The approach leverages hierarchical label structures for improved accuracy.
Abstract
In Extreme Multi Label Completion (XMLCo), the objective is to predict the missing labels of a collection of documents. Together with XML Classification, XMLCo is arguably one of the most challenging document classification tasks, as the very high number of labels (at least ten of thousands) is generally very large compared to the number of available labelled documents in the training dataset. Such a task is often accompanied by a taxonomy that encodes the labels organic relationships, and many methods have been proposed to leverage this hierarchy to improve the results of XMLCo algorithms. In this paper, we propose a new approach to this problem, TAMLEC (Taxonomy-Aware Multi-task Learning for Extreme multi-label Completion). TAMLEC divides the problem into several Taxonomy-Aware Tasks, i.e. subsets of labels adapted to the hierarchical paths of the taxonomy, and trains on these tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
