Multi-Label Test-Time Adaptation with Bound Entropy Minimization
Xiangyu Wu, Feng Yu, Qing-Guo Chen, Yang Yang, Jianfeng Lu

TL;DR
This paper introduces Bound Entropy Minimization (BEM), a novel test-time adaptation method for multi-label classification that improves confidence across multiple labels simultaneously, outperforming existing methods on several datasets.
Contribution
The paper proposes BEM, a new objective for multi-label test-time adaptation that considers multiple top predicted labels together, addressing limitations of traditional entropy minimization.
Findings
BEM outperforms state-of-the-art methods on MSCOCO, VOC, and NUSWIDE datasets.
BEM effectively adapts across various model architectures and label scenarios.
The approach improves confidence in multiple labels simultaneously during test-time adaptation.
Abstract
Mainstream test-time adaptation (TTA) techniques endeavor to mitigate distribution shifts via entropy minimization for multi-class classification, inherently increasing the probability of the most confident class. However, when encountering multi-label instances, the primary challenge stems from the varying number of labels per image, and prioritizing only the highest probability class inevitably undermines the adaptation of other positive labels. To address this issue, we investigate TTA within multi-label scenario (ML--TTA), developing Bound Entropy Minimization (BEM) objective to simultaneously increase the confidence of multiple top predicted labels. Specifically, to determine the number of labels for each augmented view, we retrieve a paired caption with yielded textual labels for that view. These labels are allocated to both the view and caption, called weak label set and strong…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper focuses on an important question. 2. This paper has a good theoretical analysis. 3. The proposed method achieves better result than baselines.
1. The equ(6) is quite difficult to understand, more explanation is needed to show the meaning. The author should explain more about how weak labels and strong label is recognized in the proposed method, and the meaning of $\hat{s}_{ij}^{x^{test}$. 2. It is unclear which parameter is learnable in this method. The authors need to clearly point out all the learnable parameters. 3. The authors could explain more about the motivation of the view prompt and caption prompt, and why they are useful for
1. The paper demonstrates robust experimentation across diverse datasets (MSCOCO, VOC, NUSWIDE) and architectures (e.g., RN50, ViT-B/16), showcasing the generalizability and efficacy of the proposed method. 2. The introduction of the Bound Entropy Minimization (BEM) for Multi-Label Test-Time Adaptation (ML–TTA) is a significant theoretical and practical advancement. It effectively addresses the challenges inherent in multi-label test-time adaptation, a space where traditional single-label approa
1. The method section, particularly the mathematical formulations and algorithmic details, could be more clearly presented. The explanations surrounding the implementation of label binding and how the paired captions are retrieved need additional clarity for readers less familiar with the intricate mechanisms of vision-language model adaptations. 2. While the paper effectively shows ML–TTA's superiority over traditional methods, it would benefit from a more detailed discussion about the choice o
1. The proposed Bound Entropy Minimization (BEM) method presents an innovative solution to improve test-time adaptation in multi-label scenarios. 2. The use of paired captions as pseudo-labels is a clever strategy to determine the number of positive labels for each test instance. 3. It considers both visual and textual modalities, optimizing for a more robust adaptation to distribution shifts. 4. The figures are well presented.
1. More detailed motivation behind the model design is preferred. It is important to explain why the authors propose the method in this work. 2. The proposed method involves multiple steps, including view augmentation, caption retrieval, and label binding, which might introduce complexity in practical implementation. Simplifying the process could enhance usability. 3. The effectiveness of the method heavily relies on the quality and relevance of the paired captions. In real-world scenarios, capt
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment
