TL;DR
This paper introduces an end-to-end trainable neural network for semantic alignment that learns from weak supervision and uses a differentiable inlier scoring module to improve accuracy amidst background clutter.
Contribution
It presents a novel neural network architecture with a soft inlier scoring module trained end-to-end using only image-level supervision for semantic alignment.
Findings
Achieves state-of-the-art results on standard benchmarks.
Effectively handles background clutter and intra-class variation.
Learns from weak supervision without manual correspondence annotations.
Abstract
We tackle the task of semantic alignment where the goal is to compute dense semantic correspondence aligning two images depicting objects of the same category. This is a challenging task due to large intra-class variation, changes in viewpoint and background clutter. We present the following three principal contributions. First, we develop a convolutional neural network architecture for semantic alignment that is trainable in an end-to-end manner from weak image-level supervision in the form of matching image pairs. The outcome is that parameters are learnt from rich appearance variation present in different but semantically related images without the need for tedious manual annotation of correspondences at training time. Second, the main component of this architecture is a differentiable soft inlier scoring module, inspired by the RANSAC inlier scoring procedure, that computes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
