Salutary Labeling with Zero Human Annotation
Wenxiao Xiao, Hongfu Liu

TL;DR
This paper introduces salutary labeling, a novel approach that automatically assigns beneficial labels to informative samples without human annotation, leveraging influence functions to improve model training efficiency and performance.
Contribution
It proposes a new automatic labeling method using influence functions, eliminating the need for costly human annotations in active learning.
Findings
Outperforms traditional active learning on nine benchmark datasets
Reduces labeling costs by eliminating human annotation
Enhances large language model fine-tuning applications
Abstract
Active learning strategically selects informative unlabeled data points and queries their ground truth labels for model training. The prevailing assumption underlying this machine learning paradigm is that acquiring these ground truth labels will optimally enhance model performance. However, this assumption may not always hold true or maximize learning capacity, particularly considering the costly labor annotations required for ground truth labels. In contrast to traditional ground truth labeling, this paper proposes salutary labeling, which automatically assigns the most beneficial labels to the most informative samples without human annotation. Specifically, we utilize the influence function, a tool for estimating sample influence, to select newly added samples and assign their salutary labels by choosing the category that maximizes their positive influence. This process eliminates…
Peer Reviews
Decision·Submitted to ICLR 2025
Solid experiments are conducted on both tabular and image datasets, integrating recent data selection methods from AL and SSL. The proposed method demonstrates promising empirical results without the need for human-annotated labels. In addition, the method is validated on a LLM fine-tuning task, which further underscores its potential for application in different domains.
While the method is framed in part within the active learning context, its approach—assigning pseudo-labels to unlabeled data via a self-training mechanism—seems more aligned with semi-supervised learning. The introduction could benefit from adjustments to reflect this alignment more accurately. Another concern is the limited technical contributions. It uses the influence function to score and pseudo-labeling the unlabeled data. Although this is an interesting application, it may not represent a
1. The motivation and paper writing are clear. 2. The experiment is sufficient 3. The method is fully automatic without human annotation
1. They do not discuss the difference with the unsupervised learning methods, such as [1] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID [2] Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification If the human intervention is removed from active learning, it will be transformed to unsupervised learning that assigns the pseudo labels to the samples. Could you discuss the difference ? 2. How to tackle with
- The *Salutary Labeling for Active Learning* framework was new to me. If we can effectively annotate data automatically without human annotator, it could demonstrate remarkable potential for machine learning as a whole. - Overall, the writing is well-structured and easy to follow, making it straightforward to understand the main concepts. - The proposed method consistently improved performance across nine datasets. This study also conducted detailed ablation studies to analyze the effectivene
- The motivation behind combining salutary labeling with active learning is not fully clear to me. - The core motivation of active learning is to label only a small number of informative data points to reduce annotation costs. If automatic labeling without human annotation costs is feasible, applying salutary labels to all available data without the selection process in active learning should suffice. - In Algorithm 1, it seems that salutary labels are generated for the entire unlabeled po
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Data Visualization and Analytics · Digital Image Processing Techniques
