Learning from both experts and data
R\'emi Besson, Erwan Le Pennec, St\'ephanie Allassonni\`ere

TL;DR
This paper introduces a method to effectively combine expert knowledge and empirical data for estimating discrete probability distributions, especially in data-scarce scenarios, ensuring improved efficiency over using either source alone.
Contribution
A novel estimator that objectively balances expert knowledge and data, with proven theoretical and empirical efficiency gains in intermediate data regimes.
Findings
Estimator outperforms individual models in efficiency
Method adaptively weights expert knowledge and data
Theoretical proof of constant-factor improvement
Abstract
In this work we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an initial domain knowledge a priori before proceeding to an online data acquisition. We are particularly interested in the intermediate regime where we do not have enough data to do without the initial expert a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
