CATs: Cost Aggregation Transformers for Visual Correspondence
Seokju Cho, Sunghwan Hong, Sangryul Jeon, Yunsung Lee, Kwanghoon Sohn, and Seungryong Kim

TL;DR
This paper introduces Cost Aggregation Transformers (CATs), a novel neural network architecture that leverages self-attention mechanisms for improved dense correspondence matching between semantically similar images, especially under challenging variations.
Contribution
The paper presents a new cost aggregation network using self-attention, multi-level aggregation, and appearance affinity modeling to enhance matching accuracy under severe deformations.
Findings
CATs outperform recent methods in dense correspondence tasks.
Inclusion of self-attention improves robustness to geometric variations.
Extensive ablations validate the effectiveness of architectural choices.
Abstract
We propose a novel cost aggregation network, called Cost Aggregation Transformers (CATs), to find dense correspondences between semantically similar images with additional challenges posed by large intra-class appearance and geometric variations. Cost aggregation is a highly important process in matching tasks, which the matching accuracy depends on the quality of its output. Compared to hand-crafted or CNN-based methods addressing the cost aggregation, in that either lacks robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields, CATs explore global consensus among initial correlation map with the help of some architectural designs that allow us to fully leverage self-attention mechanism. Specifically, we include appearance affinity modeling to aid the cost aggregation process in order to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Video Surveillance and Tracking Methods
