Binary Losses for Density Ratio Estimation
Werner Zellinger

TL;DR
This paper investigates how the choice of binary loss functions affects density ratio estimation accuracy, providing a theoretical framework for designing loss functions that prioritize different density ratio ranges, and demonstrating improved performance in real-world tasks.
Contribution
It characterizes all loss functions that yield small error in density ratio estimation within a Bregman divergence framework and offers a recipe for constructing loss functions with desired properties.
Findings
Novel loss functions outperform existing methods in 484 real-world tasks.
Theoretical characterization of loss functions for density ratio estimation.
Improved parameter choice in deep domain adaptation algorithms.
Abstract
Estimating the ratio of two probability densities from a finite number of observations is a central machine learning problem. A common approach is to construct estimators using binary classifiers that distinguish observations from the two densities. However, the accuracy of these estimators depends on the choice of the binary loss function, raising the question of which loss function to choose based on desired error properties. For example, traditional loss functions, such as logistic or boosting loss, prioritize accurate estimation of small density ratio values over large ones, even though the latter are more critical in many applications. In this work, we start with prescribed error measures in a class of Bregman divergences and characterize all loss functions that result in density ratio estimators with small error. Our characterization extends results on composite binary losses…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper appears to be self-contained and introduces all the tools and notation it uses. 2. The experimental results are fairly extensive for a theoretical paper. 3. The theoretical results, especially Lemma 4 in the appendix, seem to rely on non-trivial applications of several previously established results.
1. The presentation of the paper could be significantly improved. The paper uses heavy notation — for example, understanding $B_{-\underline{L}^\circ}(\beta,g\circ f)$ requires substantial effort. Another clear example is Remark 1, which is completely incomprehensible unless the reader is already familiar with everything it covers. 2. The novelty of this work is unclear. I believe the authors would agree that the main contribution of this paper is theoretical and primarily represented by Theore
References to existing research are sufficiently provided.
#### Major Weaknesses: 1. There are concerns regarding the novelty of the theoretical results presented in this study. Specifically, results such as the necessity of Equation (8) in Theorem 1 appear to be easily derived from findings in prior work referenced by this study ([1], [2], and [3]). A detailed examination of this issue is provided below. 2. Additionally, the canonical form of the density ratio link, given in Equation (10), does not constitute a new result, as it can be derived from re
1. Compared to the related literature, this paper introduces a new framework for constructing novel loss functions, prioritizing an accurate estimation of large density ratio values over smaller ones. 2. It provides a thorough mathematical foundation, characterizing the types of loss functions that align with specific error measures derived from Bregman divergences. The comparison with the related literature is good. 3. The work shows large practical implementation through empirical data and re
1. The paper does not delve into the sample complexity of the proposed methods, which could be critical for understanding their efficiency in various scenarios. 2. While it improves estimation for large density values, the impact on performance for smaller values isn't thoroughly explored. 3. A more detailed introduction of the experiments should be considered.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Signal Denoising Methods
