Diffusion Model for Dense Matching
Jisu Nam, Gyuseong Lee, Sunwoo Kim, Hyeonsu Kim, Hyoungwon Cho, Seyeon, Kim, Seungryong Kim

TL;DR
This paper introduces DiffMatch, a diffusion-based framework for dense image matching that explicitly models data and prior terms, improving accuracy over existing deep learning methods.
Contribution
The paper presents a novel conditional diffusion model for dense matching, explicitly incorporating prior knowledge and improving robustness to ambiguities.
Findings
Significant performance improvements over existing methods.
Effective training stabilization and memory reduction techniques.
Enhanced inference method for better matching accuracy.
Abstract
The objective for establishing dense correspondence between paired images consists of two terms: a data term and a prior term. While conventional techniques focused on defining hand-designed prior terms, which are difficult to formulate, recent approaches have focused on learning the data term with deep neural networks without explicitly modeling the prior, assuming that the model itself has the capacity to learn an optimal prior from a large-scale dataset. The performance improvement was obvious, however, they often fail to address inherent ambiguities of matching, such as textureless regions, repetitive patterns, and large displacements. To address this, we propose DiffMatch, a novel conditional diffusion-based framework designed to explicitly model both the data and prior terms. Unlike previous approaches, this is accomplished by leveraging a conditional denoising diffusion model.…
Peer Reviews
Decision·ICLR 2024 oral
The paper presents a novel framework for dense matching, DiffMatch, and shows significant performance improvements over existing techniques. I believe this is one of the first work to apply diffusion model to solve dense correspondence (flow estimation) tasks and the results are very encouraging. The proposed approach tries to address inherent ambiguities of matching, such as textureless regions, repetitive patterns, large displacements, or noises. The approach also seems to be efficient and sca
I do not have major concerns on the paper less lacking some details. One notable improvement will be adding more discussions to diffusion based dense prediction networks, especially methods like DDP [1]. It is questionable to me why DDP is not directly applicable to the task of dense matching. Another possible improvement is to add diffusion-based dense prediction models as baselines to the method (\eg a DDP model trained on dense flow supervision).
With the caveat that this is not my precise area of specialization: I enjoyed reading the paper and think that the proposed method is elegant and interesting. The idea of treating the correspondence field as an image to be synthesized is compelling. The additional components in the pipeline (e.g. for super-res) seem appropriately chosen. The results are good -- even if they don't always beat state-of-the-art baselines -- and definitely good enough given that the technique is of independent metho
"These approaches assume that the matching prior can be learned within the model architecture by leveraging the high capacity of deep networks" For the argument in the paper to be more compelling, the above statement needs to be clarified. Exactly how is the prior "learned within the model architecture"? Can we say something more precise about how the prior is captured, and how much of it, in these earlier methods? How many samples were used to compute the MAP estimates used for statistics in
+ **Novel formulation of the problem**: to the best of my knowledge I have not seen a diffusion model used in this context to refine a matching field. + **Possibility to model uncertainty**: the proposed formulation models a distribution of plausible matching fields given an initial guess and therefore models implicitly the uncertainty of the matching process. Fig. 6 in the supplementary shows some preliminary analysis of the modeled uncertainty. I found this emerging property of the formulati
a) **Possible generalization concerns and limited experimental evaluation**: modeling a prior on what a good matching field looks like using a diffusion model exposes the proposed solution to generalization problems since the prior will only model the type of matching flows seen during training. For example in the extreme case where the method is trained only with match fields coming from homographies it will probably not generalize well to other types of non-rigid transformations between frames
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · AI in cancer detection
Methodsfail · Diffusion
