Discrete Latent Perspective Learning for Segmentation and Detection

Deyi Ji; Feng Zhao; Lanyun Zhu; Wenwei Jin; Hongtao Lu; Jieping Ye

arXiv:2406.10475·cs.CV·June 18, 2024

Discrete Latent Perspective Learning for Segmentation and Detection

Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, Jieping Ye

PDF

Open Access

TL;DR

This paper introduces DLPL, a novel framework that enables neural networks to learn perspective-invariant features from single-view images, improving performance across various vision tasks and scenarios.

Contribution

The paper proposes a new framework with modules for discretizing features, transforming perspectives, and fusing multi-perspective information, advancing perspective-invariant learning.

Findings

01

DLPL improves detection and segmentation accuracy across diverse datasets.

02

The framework effectively handles images from daily, UAV, and auto-driving scenarios.

03

DLPL outperforms existing methods in perspective-invariant tasks.

Abstract

In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning