Can We Evaluate Domain Adaptation Models Without Target-Domain Labels?

Jianfei Yang; Hanjie Qian; Yuecong Xu; Kai Wang; Lihua Xie

arXiv:2305.18712·cs.CV·February 20, 2024·1 cites

Can We Evaluate Domain Adaptation Models Without Target-Domain Labels?

Jianfei Yang, Hanjie Qian, Yuecong Xu, Kai Wang, Lihua Xie

PDF

Open Access 3 Reviews

TL;DR

This paper introduces the Transfer Score, a novel unsupervised metric for evaluating domain adaptation models without target labels, enabling model selection, hyperparameter tuning, and checkpoint identification.

Contribution

The paper proposes the Transfer Score metric for unsupervised evaluation of UDA models, addressing the challenge of performance assessment without target labels.

Findings

01

The Transfer Score effectively selects the best UDA method.

02

It optimizes hyperparameters to prevent model degeneration.

03

It identifies the best checkpoint of UDA models.

Abstract

Unsupervised domain adaptation (UDA) involves adapting a model trained on a label-rich source domain to an unlabeled target domain. However, in real-world scenarios, the absence of target-domain labels makes it challenging to evaluate the performance of UDA models. Furthermore, prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer, further exacerbating the evaluation problem. In this paper, we propose a novel metric called the \textit{Transfer Score} to address these issues. The proposed metric enables the unsupervised evaluation of UDA models by assessing the spatial uniformity of the classifier via model parameters, as well as the transferability and discriminability of deep representations. Based on the metric, we achieve three novel objectives without target-domain labels: (1) selecting the best UDA method…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

1. Evaluate UDA models in an unsupervised manner is very important, but there is a lack of relevant research in the current community 2. Experiments demonstrate the effectiveness of “Transfer Score” in method selection, hyperparameter tuning, and checkpoint selection. 3. The paper is well-written and easy to understand.

Weaknesses

1. “Transfer Score” uses clustering and class balance as the measurement criteria. However, the assumption of clustering and class balance usually may not always hold. Acutally, in many real world tasks, class imbanlance often exists. For example, in cross-domain semantic segmentation, a severe category imbalance is often present [1], which limits the application of this metric. 2. Some methods[2,3] directly adopt Eq. 1 and Eq. 4 as optimization objectives. For these methods, the transfer score

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

+ The proposed metric is meaningful and intuitive. + The proposed metric seems useful in various evaluation settings.

Weaknesses

- As the authors stated, “prevailing UDA methods relying on adversarial training and self-training could lead to model degeneration and negative transfer”, How can we prove this viewpoint? Can the proposed metric be applied to these types of methods for performance improvement in regular UDA settings? - Although the proposed metric is intuitive, it is difficult to validate that it is definitely correct for evaluating a UDA model. In fact, various existing UDA approaches also adopt “Transfer Scor

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- This idea is simple and easy to follow.

Weaknesses

- The experiments are weak. - Task1 and Task2 are actually unsupervised model evaluation problems, all the unsupervised validation methods such as SND can be employed directly, but the authors do not compare their method with these methods. - In the experiments, only six UDA methods were evaluated. In task 2, there are only five candidate hyperparameters. The number is too small to illustrate the effectiveness of the proposed method. - The UDA datasets employed in the experiments are not

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning