Geometrically Aligned Transfer Encoder for Inductive Transfer in Regression Tasks
Sung Moon Ko, Sumin Lee, Dae-Woong Jeong, Woohyung Lim, Sehui Han

TL;DR
This paper introduces GATE, a novel transfer learning method for regression tasks that leverages differential geometry to align latent spaces on Riemannian manifolds, improving transfer performance and stability.
Contribution
The paper proposes GATE, a differential geometry-based transfer encoder that aligns latent spaces across tasks for regression, extending transfer learning beyond classification.
Findings
GATE outperforms conventional transfer methods on molecular graph datasets.
GATE provides stable transfer and extrapolation behavior.
The method effectively regularizes models in latent and extrapolation regions.
Abstract
Transfer learning is a crucial technique for handling a small amount of data that is potentially related to other abundant data. However, most of the existing methods are focused on classification tasks using images and language datasets. Therefore, in order to expand the transfer learning scheme to regression tasks, we propose a novel transfer technique based on differential geometry, namely the Geometrically Aligned Transfer Encoder (GATE). In this method, we interpret the latent vectors from the model to exist on a Riemannian curved manifold. We find a proper diffeomorphism between pairs of tasks to ensure that every arbitrary point maps to a locally flat coordinate in the overlapping region, allowing the transfer of knowledge from the source to the target data. This also serves as an effective regularizer for the model to behave in extrapolation regions. In this article, we…
Peer Reviews
Decision·ICLR 2024 poster
- Novel regularization procedure which also has the potential of being used outside of the scope of this paper. - Superior performance when compared to other methods for transfer learning. - Intuitive idea and easy to implement. - Good experimental section, with a nice exploration of overfitting.
- The writing quality needs to be improved, there are both distracting grammar issues and, more importantly, the mathematical formulation of the method and description of the prerequisites for understanding this work have not been adequately presented. - Section 5.2 is not well-supported, specifically the assertion "Ideally, if a model is well-guided by the right information and regularized properly, the overall geometry of the latent space may remain stable and not depend on the type of source
This paper proposes an interesting approach to transfer learning, and the contribution of each proposed feature (consistency loss, distance loss) is analyzed in an ablation study. It is very interesting that the distance loss can prevent overfitting. The choice of a dataset with 14 different tasks is suitable for a multi-task setup. The use of two different random splits shows careful consideration of the testing setup. The graphical check of the latent space across tasks helps builds confid
The idea of enforcing cycle-consistency is not new, and it seems appropriate to cite related literature on cycle-consistency within transfer learning such as CyCADA (Hoffman et al. 2018). I found it difficult to follow the notation introduced to explain the method given the lack of explanation on the notations. More details are needed to describe the method precisely. It is not clear to me how distance loss helps prevent overfitting. A toy example would help with the intuition. The fact tha
Originality and significance: The Riemannian view of the latent space is likely not new to this work, but two of the loss functions are novel to this work (to the best of my knowledge). The empirical improvement over the baselines in the studied task of molecular property prediction is significant. Quality and clarity: The paper is overall well motivated. The descriptions are accompanied by formulas and helpful schematic diagrams. The empirical analysis covers several aspects of the proposed so
Even though the method is likely applicable and useful in other domains, the paper only studies it on the molecular property prediction task. This significantly cuts into the impact of the paper as the results cannot (in good faith) be extrapolated to a completely different domain of tasks. With a single model architecture for molecular property prediction as the only domain, a more detailed description is missing from the main body of the paper (e.g. input/output/latent dimensions, range of va
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
