Auto-regressive transformation for image alignment

Kanggeon Lee; Soochahn Lee; and Kyoung Mu Lee

arXiv:2505.04864·cs.CV·April 14, 2026

Auto-regressive transformation for image alignment

Kanggeon Lee, Soochahn Lee, and Kyoung Mu Lee

PDF

TL;DR

The paper introduces Auto-Regressive Transformation (ART), a novel iterative multi-scale method that improves image alignment accuracy in challenging conditions by focusing on critical regions and refining transformations hierarchically.

Contribution

It proposes a new auto-regressive, multi-scale approach with cross-attention guidance for robust image alignment, outperforming existing methods in difficult scenarios.

Findings

01

ART outperforms state-of-the-art methods on planar images.

02

Achieves comparable performance on 3D scene images.

03

Effectively handles feature-sparse regions and large deformations.

Abstract

Existing methods for image alignment struggle in cases involving feature-sparse regions, extreme scale and field-of-view differences, and large deformations, often resulting in suboptimal accuracy. Robustness to these challenges can be improved through iterative refinement of the transform field while focusing on critical regions in multi-scale image representations. We thus propose Auto-Regressive Transformation (ART), a novel method that iteratively estimates the coarse-to-fine transformations through an auto-regressive pipeline. Leveraging hierarchical multi-scale features, our network refines the transform field parameters using randomly sampled points at each scale. By incorporating guidance from the cross-attention layer, the model focuses on critical regions, ensuring accurate alignment even in challenging, feature-limited conditions. Extensive experiments demonstrate that ART…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.