Multi-label Learning with Random Circular Vectors
Ken Nishida, Kojiro Machi, Kazuma Onishi, Katsuhiko Hayashi, Hidetaka, Kamigaito

TL;DR
This paper introduces a novel approach for extreme multi-label classification using random circular vectors, which improves label encoding and retrieval, leading to better performance and significantly smaller output layers in deep neural networks.
Contribution
It proposes using complex-valued circular vectors for label encoding in DNNs, enhancing efficiency and accuracy in XMC tasks compared to traditional real-valued vectors.
Findings
Circular vectors outperform real-valued vectors in label encoding capacity.
The method achieves significant performance improvements on XMC datasets.
Output layer size is reduced by up to 99% with circular vectors.
Abstract
The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance. While deep neural networks~(DNNs) have demonstrated remarkable success in XMC problems, the task is still challenging because it must deal with a large number of output labels, which make the DNN training computationally expensive. This paper addresses the issue by exploring the use of random circular vectors, where each vector component is represented as a complex amplitude. In our framework, we can develop an output layer and loss function of DNNs for XMC by representing the final output layer as a fully connected layer that directly predicts a low-dimensional circular vector encoding a set of labels for a data instance. We conducted experiments on synthetic datasets to verify that circular vectors have better…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The motivation is clear and the algorithm is sensible. 2. The proposed method is tested on several benchmarks.
The paper is in general easy to follow and well-structured. There are some interesting theoretical guarantees, which seem simple and effective. Nevertheless, I have the following concerns: 1. Not enough empirical evaluations. it necessary to evaluate other state-of-the-art benchmarks. 2. What is the computational cost of method? [addressed by rebuttal] 3. Will the code be shared? [addressed by rebuttal].
* The paper introduces the innovative idea of complex-valued holographic reduced representations (CHRR) for XML tasks which can significantly improve the XML prediction accuracy over and above that achieved by real-valued HRR * Experiments demonstrate the efficacy of proposed approach on several moderately large-scaled XML datasets in terms of P and PSP gains
* The contributions of this paper are rather limited. The key ideas behind adapting HRR to XML, such as unitary normalization and HRR XML loss, have been borrowed from [Ganesan et.al. '21]. The main novelty lies in generalizing real to complex HRR. While this is useful, its efficiency-accuracy trade-offs relative to original HRR have not been well established. * The experimental validation of proposed approach is weak. - Datasets involve moderate scale XML datasets and none from >1 million sca
1. The overall presentation of the paper is clear and easy to follow 2. Using circular vectors with complex amplitude is technically sound
1. CHRR do not reduce model parameters at the inference stage, compared to the FC baseline. 2. The inference time complexity of CHRR is as high as `O(L)` while FC and HRR can be `O(log(L))`. 3. The experiment results are not very comprehensive. see detailed questions below.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies
MethodsSparse Evolutionary Training
