Active Learning for Regression with Aggregated Outputs
Tomoharu Iwata

TL;DR
This paper introduces an active learning approach for regression tasks where only aggregated outputs are observable, using mutual information to select informative sets and employing Bayesian models for efficient computation, leading to improved performance with fewer labeled data.
Contribution
It proposes a novel active learning method for regression with aggregated outputs, utilizing mutual information and Bayesian models for efficient set selection.
Findings
Achieves better predictive performance with fewer labeled sets.
Uses mutual information for effective set selection.
Demonstrates effectiveness across various datasets.
Abstract
Due to the privacy protection or the difficulty of data collection, we cannot observe individual outputs for each instance, but we can observe aggregated outputs that are summed over multiple instances in a set in some real-world applications. To reduce the labeling cost for training regression models for such aggregated data, we propose an active learning method that sequentially selects sets to be labeled to improve the predictive performance with fewer labeled sets. For the selection measurement, the proposed method uses the mutual information, which quantifies the reduction of the uncertainty of the model parameters by observing the aggregated output. With Bayesian linear basis functions for modeling outputs given an input, which include approximated Gaussian processes and neural networks, we can efficiently calculate the mutual information in a closed form. With the experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Data Stream Mining Techniques
