Meta-learning Representations for Learning from Multiple Annotators

Atsutoshi Kumagai; Tomoharu Iwata; Taishi Nishiyama; Yasutoshi Ida; Yasuhiro Fujiwara

arXiv:2506.10259·cs.LG·June 13, 2025

Meta-learning Representations for Learning from Multiple Annotators

Atsutoshi Kumagai, Tomoharu Iwata, Taishi Nishiyama, Yasutoshi Ida, Yasuhiro Fujiwara

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a meta-learning approach that leverages related tasks and latent space embeddings to effectively learn from multiple noisy annotators, especially when limited annotated data is available.

Contribution

The proposed method combines meta-learning, latent space embedding, and EM algorithm to improve learning from noisy annotations with scarce data, outperforming existing approaches.

Findings

01

Effective in real-world noisy datasets

02

Improves classification accuracy with limited data

03

Efficient backpropagation through EM steps

Abstract

We propose a meta-learning method for learning from multiple noisy annotators. In many applications such as crowdsourcing services, labels for supervised learning are given by multiple annotators. Since the annotators have different skills or biases, given labels can be noisy. To learn accurate classifiers, existing methods require many noisy annotated data. However, sufficient data might be unavailable in practice. To overcome the lack of data, the proposed method uses labeled data obtained in different but related tasks. The proposed method embeds each example in tasks to a latent space by using a neural network and constructs a probabilistic model for learning a task-specific classifier while estimating annotators' abilities on the latent space. This neural network is meta-learned to improve the expected test classification performance when the classifier is adapted to a given small…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

1. Unlike other meta-learning approaches like MAML, which uses second-order gradients, the proposed model uses the EM algorithm to achieve closed-form updates, making it computationally more efficient. 2. By training on multiple source tasks, the model learns how to handle new unseen noisy target tasks. 3. The pseudo-annotation strategy is interesting, where noise is artificially added to clean source data to simulate the real-world setting of noisy annotators.

Weaknesses

1. Assumption: This paper assumes noise comes from multiple annotators with different error patterns and biases and each annotator’s behavior is consistent within a task, but for real-life data scenarios, it is not accurate, each annotator is domain-specific. 2. Novelty: The paper combines well-established methods like GMMs and annotator-specific confusion matrices within a meta-learning framework to handle noisy labels, the novelty is somewhat limited, such as GMM-based clustering has already

Reviewer 02Rating 3Confidence 5

Strengths

Originality: The paper introduces a novel approach that integrates meta-learning with the handling of noisy annotations, representing an innovative solution to a well-recognized problem in machine learning. Quality: The methodology is technically sound, employing a clear framework and the well-established EM algorithm for parameter estimation. The approach is backed by empirical results, showcasing its effectiveness across various datasets. Significance: This work is significant as it addresses

Weaknesses

1.The description of this paper is not clear enough. For example, the description of the method proposed in this paper in the last two paragraphs of INTRODUCTION is lengthy and lacks logic. The RELATED WORK, DATA, COMPARISON METHODS sections are full of nonsense and lack structure. 2.The experimental comparison method lacks the latest methods in the last two years. 3.The datasets used in the experiment are all in the field of images and lack representativeness.

Reviewer 03Rating 5Confidence 4

Strengths

* **Originality:** The paper takes an interesting approach to tackling noisy annotations in a way that considers both the variability in annotator reliability and the scarcity of target task data. By simulating noisy labels during meta-training, the method closely approximates real-world challenges. * **Quality:** The model is theoretically grounded, with empirical testing on multiple datasets to support its claims. Using the EM algorithm within meta-learning allows the model to adapt efficient

Weaknesses

* **Computational Complexity:** The meta-learning phase involves training over multiple source tasks and iterative EM updates and maybe computationally demanding on larger datasets or in high-dimensional settings. * **Impractical Settings** Besides, such high-quality source samples in the source tasks might not be available in practice. Even in most existing test sets (i.e., CIFAR-10, CIFAR-100, ImageNet), there are a lot of label error issues as mentioned in R1. Noting that the testing phase

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning