FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Yukang Cao, Chenyang Si, Jinghao Wang, Ziwei Liu

TL;DR
FreeMorph is a novel tuning-free image morphing method that uses guidance-aware spherical interpolation and step-oriented variation to produce high-quality, fast, and consistent transitions between images with different semantics or layouts.
Contribution
It introduces a tuning-free approach for image morphing that integrates guidance-aware interpolation and variation trends, eliminating the need for per-instance model fine-tuning.
Findings
Outperforms existing methods in quality and speed, being 10x to 50x faster.
Establishes new state-of-the-art in image morphing.
Effectively handles images with different semantics or layouts.
Abstract
We present FreeMorph, the first tuning-free method for image morphing that accommodates inputs with different semantics or layouts. Unlike existing methods that rely on finetuning pre-trained diffusion models and are limited by time constraints and semantic/layout discrepancies, FreeMorph delivers high-fidelity image morphing without requiring per-instance training. Despite their efficiency and potential, tuning-free methods face challenges in maintaining high-quality results due to the non-linear nature of the multi-step denoising process and biases inherited from the pre-trained diffusion model. In this paper, we introduce FreeMorph to address these challenges by integrating two key innovations. 1) We first propose a guidance-aware spherical interpolation design that incorporates explicit guidance from the input images by modifying the self-attention modules, thereby addressing…
Peer Reviews
Decision·Submitted to ICLR 2025
The quantiative and qualitative experiments, especially the ablation study seem well-executed and thorough, with the exception of one important method missing (see weaknesses). I really appreciate the authors thoroughly evaluating across different levels of challenging interpolations and proving a large number of qualitative comparisons in the appendix. The proposed method seems reasonable in construction and effective in practice. In the comparisons performed by the authors, their proposed me
Attention Interpolation for Text-to-Image Diffusion (He et al., 2024) is cited in the related work section but never compared to, despite the methods seeming quite similar. Given that the authors were aware of this work at the time of submission, I would expect the authors to discuss how their and this method relate and incorporate it into quantitative and qualitative comparisons. Similarly, Smooth Diffusion (Guo et al., CVPR 2024) is quite closely related and could also be compared to, althoug
1. This method does not require training, making it more accessible to regular users. 1. The authors have conducted extensive experiments to identify the best combination of modules and hyperparameters. 1. Although this is a task that is difficult to evaluate, I can appreciate the authors’ efforts to provide both quantitative and qualitative results.
Related to the task 1. There are some unverified claims at the core of this task. For example, see lines 254-255. 1. What constitutes good image morphing is somewhat underdefined. The intended effect that this paper seeks to achieve feels too vague and abstract, making it challenging to identify a clear definition of effective image morphing. 1. Even with quantitative metrics like PPL, high scores do not necessarily indicate results that align with human preferences, which can be highly subjecti
- The proposed tuning-free morphing approach is straightforward and effective. The modified interpolation of the self-attention is able to produce more natural transition between morphing sequences. - A new dataset, Morph4Data, is presented and used in the evaluation. This dataset contains four categories, which helps in a detailed analysis on the image morphing methods. - This paper presents and analyzes many baseline settings in the ablation study. This helps in understanding the specific role
- This paper is apparently not well prepared for publication. For example: - There are many errors in spelling (“ur” in line 122, “owever” in line 133, etc.) and formulations (missing “{“ and “}” in line 179, etc.). - In equation (7), since m has the same size as z, what does it mean by “m=1”? - Both “DeepMorpher” and “deepmorpher” appear in the paper. - The image captions of Fig.2 and Fig.3 is too simple. Reader have to move to the main text to try to understand the figures. - In the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
MethodsDiffusion
