Retrieval-augmented Encoders for Extreme Multi-label Text Classification

Yau-Shian Wang; Wei-Cheng Chang; Jyun-Yu Jiang; Jiong Zhang; and Hsiang-Fu Yu; S. V. N. Vishwanathan

arXiv:2502.10615·cs.CL·February 18, 2025

Retrieval-augmented Encoders for Extreme Multi-label Text Classification

Yau-Shian Wang, Wei-Cheng Chang, Jyun-Yu Jiang, Jiong Zhang, and Hsiang-Fu Yu, S. V. N. Vishwanathan

PDF

Open Access 3 Reviews

TL;DR

This paper introduces RAEXMC, a retrieval-augmented encoder framework for extreme multi-label text classification that enhances generalization and memorization without extra trainable parameters, achieving state-of-the-art results and significant speedups.

Contribution

The paper proposes RAEXMC, a retrieval-augmented dual-encoder framework that improves extreme multi-label classification by combining retrieval with contrastive training, eliminating the need for complex model combinations.

Findings

01

RAEXMC outperforms existing methods on four benchmarks.

02

RAEXMC achieves over 10x speedup on large-scale datasets.

03

RAEXMC advances the state-of-the-art DEXML method.

Abstract

Extreme multi-label classification (XMC) seeks to find relevant labels from an extremely large label collection for a given text input. To tackle such a vast label space, current state-of-the-art methods fall into two categories. The one-versus-all (OVA) method uses learnable label embeddings for each label, excelling at memorization (i.e., capturing detailed training signals for accurate head label prediction). In contrast, the dual-encoder (DE) model maps input and label text into a shared embedding space for better generalization (i.e., the capability of predicting tail labels with limited training data), but may fall short at memorization. To achieve generalization and memorization, existing XMC methods often combine DE and OVA models, which involves complex training pipelines. Inspired by the success of retrieval-augmented language models, we propose the Retrieval-augmented…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1. This study provides a comprehensive overview and in-depth summarization of the various existing approaches that have been developed for extreme multi-label classification. A thorough analysis is conducted on the advantages and disadvantages of each type of approaches. method. This ensures that readers gain a robust understanding of the current models available. 2. The concept of introducing the retrieval-augmented method is interesting. This presents interesting possibilities for improving t

Weaknesses

1. My primary concern regarding this study is on the aspect of novelty. The concept of incorporating retrieval-augmented knowledge certainly has the potential to provide valuable background information that can enhance classification performance. However, aside from this innovative idea, the overall design of the model remains quite conventional and adheres to traditional methodologies, which may limit its effectiveness. 2. There is a lack of detailed information regarding the implementation o

Reviewer 02Rating 5Confidence 5

Strengths

1. The idea of using retrieval augmentation with XMC is very interesting. 2. Storing existing dataset samples in memory is a nice trick. 3. Training seems to be very efficient. Also, the authors have performed extensive experimentation under many settings.

Weaknesses

1. OAK is a recent method which also uses memory for XMC tasks. How does RAE-XMC compare with OAK? 2. PSP metrics seem to be very commonly used across XMC literature. Can you please report PSP also? 3. Table 1: Does TT include memory construction time also? If not, it will be nice to include that also. 4. Improvements are somewhat weak in Table 1. On LF-AmazonTitles-131K, P@1 is not best for RAE-XMC. On LF-WikiSeeAlso-320K it looks like RAE-XMC is not stat sig better than NGAME. Also, on LF-Amaz

Reviewer 03Rating 5Confidence 4

Strengths

- Very well written, I could follow each and every section - Training is scalable and architecture does not add to the training memory

Weaknesses

- Does not compare with State-of-the-art XC methods like OAK which works in retriever augmented encoders - Results are not reported on short text titles datasets, also available on the XC repository. I would suggest authors to report numbers on titles datasets as they are closer to real world tasks. - OAK uses auxiliary information, why was this auxiliary information not considered in knowledge memory

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining