Doduo: Learning Dense Visual Correspondence from Unsupervised   Semantic-Aware Flow

Zhenyu Jiang; Hanwen Jiang; Yuke Zhu

arXiv:2309.15110·cs.CV·September 27, 2023

Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow

Zhenyu Jiang, Hanwen Jiang, Yuke Zhu

PDF

Open Access

TL;DR

Doduo is a self-supervised method that learns dense visual correspondence from in-the-wild images and videos, effectively handling dynamic scene changes for robotic perception tasks.

Contribution

Introduces Doduo, a novel self-supervised approach that incorporates semantic priors for robust dense correspondence learning without ground truth labels.

Findings

01

Outperforms existing self-supervised methods in point-level correspondence accuracy.

02

Effective in dynamic scenes with substantial transformations.

03

Demonstrates practical applications in robotics such as articulation estimation and zero-shot manipulation.

Abstract

Dense visual correspondence plays a vital role in robotic perception. This work focuses on establishing the dense correspondence between a pair of images that captures dynamic scenes undergoing substantial transformations. We introduce Doduo to learn general dense visual correspondence from in-the-wild images and videos without ground truth supervision. Given a pair of images, it estimates the dense flow field encoding the displacement of each pixel in one image to its corresponding pixel in the other image. Doduo uses flow-based warping to acquire supervisory signals for the training. Incorporating semantic priors with self-supervised flow training, Doduo produces accurate dense correspondence robust to the dynamic changes of the scenes. Trained on an in-the-wild video dataset, Doduo illustrates superior performance on point-level correspondence estimation over existing self-supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning