Iterative Deep Homography Estimation
Si-Yuan Cao, Jianxin Hu, Zehua Sheng, Hui-Liang Shen

TL;DR
The paper introduces IHN, a fully trainable deep homography estimation network with iterative refinement, achieving state-of-the-art accuracy and efficiency on various datasets, including dynamic scenes with moving objects.
Contribution
The paper presents IHN, a novel trainable iterative homography network with tied weights, outperforming existing methods in accuracy and speed, and introduces IHN-mov for dynamic scenes.
Findings
IHN outperforms most existing methods in static scenes.
2-scale IHN surpasses all competitors significantly.
IHN-mov improves accuracy in moving-object scenes.
Abstract
We propose Iterative Homography Network, namely IHN, a new deep homography estimation architecture. Different from previous works that achieve iterative refinement by network cascading or untrainable IC-LK iterator, the iterator of IHN has tied weights and is completely trainable. IHN achieves state-of-the-art accuracy on several datasets including challenging scenes. We propose 2 versions of IHN: (1) IHN for static scenes, (2) IHN-mov for dynamic scenes with moving objects. Both versions can be arranged in 1-scale for efficiency or 2-scale for accuracy. We show that the basic 1-scale IHN already outperforms most of the existing methods. On a variety of datasets, the 2-scale IHN outperforms all competitors by a large gap. We introduce IHN-mov by producing an inlier mask to further improve the estimation accuracy of moving-objects scenes. We experimentally show that the iterative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Advanced Image Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
