Improving Deep Regression with Tightness

Shihao Zhang; Yuguang Yan; Angela Yao

arXiv:2502.09122·cs.LG·February 14, 2025

Improving Deep Regression with Tightness

Shihao Zhang, Yuguang Yan, Angela Yao

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper proposes a novel regularizer based on optimal transport and a target duplication strategy to preserve ordinality in deep regression, reducing conditional entropy and improving generalization.

Contribution

It introduces a new regularizer and a target duplication method to better preserve target relationships, enhancing deep regression performance.

Findings

01

The proposed methods improve regression accuracy on real-world tasks.

02

Reducing $H(Z|Y)$ correlates with better generalization.

03

Experimental results confirm the effectiveness of the strategies.

Abstract

For deep regression, preserving the ordinality of the targets with respect to the feature representation improves performance across various tasks. However, a theoretical explanation for the benefits of ordinality is still lacking. This work reveals that preserving ordinality reduces the conditional entropy $H (Z ∣ Y)$ of representation $Z$ conditional on the target $Y$ . However, our findings reveal that typical regression losses do little to reduce $H (Z ∣ Y)$ , even though it is vital for generalization performance. With this motivation, we introduce an optimal transport-based regularizer to preserve the similarity relationships of targets in the feature space to reduce $H (Z ∣ Y)$ . Additionally, we introduce a simple yet efficient strategy of duplicating the regressor targets, also with the aim of reducing $H (Z ∣ Y)$ . Experiments on three real-world regression tasks verify the effectiveness of…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

+ The proposed method achieves superior performance compared to prior deep regression techniques on two benchmark datasets, demonstrating its effectiveness. + The authors provide an interesting analysis on why ordinal feature spaces are not naturally learned under typical regression loss functions, highlighting a crucial aspect often overlooked in regression tasks. + The authors offer a comparison between classification and regression, explaining why classification losses tend to better constr

Weaknesses

Although the authors discuss why minimizing Mean Squared Error (MSE) may fail to learn ordinal feature spaces, they do not provide empirical results or visualizations, such as t-SNE plots, to support this claim. Comparative visualizations between the proposed method and RankSim would strengthen the discussion Inconsistency between Eq 3 and Eq 5 – Should both of them be from the batch level? For example, change N to b? The rationale for employing multiple regressors is unclear. Why is a multi-r

Reviewer 02Rating 6Confidence 1

Strengths

1. The paper is overall well-written. The proposal of linking conditional entropy with ordinality preserving in regression also seems new and interesting. 2. The authors propose the ROT Regularizer and a multi-target learning strategy, both of which are innovative methods for improving regression. These techniques address the limitations of standard regression by refining the feature space structure and better preserving relationships among targets, enabling more robust and accurate prediction

Weaknesses

I have limited knowledge in this specific area and currently lack the expertise to identify any potential weaknesses in the paper. The overall manuscript appears satisfactory to me. The authors may gain additional insights for refinement through feedback from reviewers more experienced in this topic.

Reviewer 03Rating 6Confidence 4

Strengths

- S1: The finding that the updated directions of the features are limited in normal regression training is intriguing. - S2: The finding is theoretically supported. - S3: The paper is well-organized and easy to follow.

Weaknesses

- W1: The difference between the global and local tightness is ambiguous. While $\mathcal{H}(\mathbf{Z}|\mathbf{Y})$ is called tightness, formulating the global and local tightness would strengthen the theoretical analysis. - W2: The justification of design choices of the proposed methods needs to be clarified. - For MT, is there a possibility that the multiple regressors will collapse into a single solution? Are the solution spaces $S_y$ orthogonal? - For ROT-Reg, is the self-entrop

Code & Models

Repositories

needylove/regression_tightness
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · Face and Expression Recognition