CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Seokju Cho; Sunghwan Hong; Seungryong Kim

arXiv:2202.06817·cs.CV·November 1, 2022

CATs++: Boosting Cost Aggregation with Convolutions and Transformers

Seokju Cho, Sunghwan Hong, Seungryong Kim

PDF

Open Access 1 Repo

TL;DR

CATs++ introduces a novel transformer-based cost aggregation method for image matching that leverages global receptive fields, significantly improving robustness and accuracy over previous CNN-based approaches.

Contribution

This paper presents CATs++, an extension of CATs, combining transformers with architectural innovations to enhance cost aggregation in image matching, overcoming CNN limitations and reducing computational costs.

Findings

01

Outperforms previous state-of-the-art on PF-WILLOW, PF-PASCAL, and SPair-71k datasets.

02

Demonstrates significant accuracy improvements with extensive ablation studies.

03

Achieves robust matching under severe deformations.

Abstract

Cost aggregation is a highly important process in image matching tasks, which aims to disambiguate the noisy matching scores. Existing methods generally tackle this by hand-crafted or CNN-based methods, which either lack robustness to severe deformations or inherit the limitation of CNNs that fail to discriminate incorrect matches due to limited receptive fields and inadaptability. In this paper, we introduce Cost Aggregation with Transformers (CATs) to tackle this by exploring global consensus among initial correlation map with the help of some architectural designs that allow us to fully enjoy global receptive fields of self-attention mechanism. Also, to alleviate some of the limitations that CATs may face, i.e., high computational costs induced by the use of a standard transformer that its complexity grows with the size of spatial and feature dimensions, which restrict its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KU-CVLAB/CATs-PlusPlus
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques