Discrete Latent Perspective Learning for Segmentation and Detection
Deyi Ji, Feng Zhao, Lanyun Zhu, Wenwei Jin, Hongtao Lu, Jieping Ye

TL;DR
This paper introduces DLPL, a novel framework that enables neural networks to learn perspective-invariant features from single-view images, improving performance across various vision tasks and scenarios.
Contribution
The paper proposes a new framework with modules for discretizing features, transforming perspectives, and fusing multi-perspective information, advancing perspective-invariant learning.
Findings
DLPL improves detection and segmentation accuracy across diverse datasets.
The framework effectively handles images from daily, UAV, and auto-driving scenarios.
DLPL outperforms existing methods in perspective-invariant tasks.
Abstract
In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive collection of multi-view images or limited data augmentation techniques, we propose a novel framework, Discrete Latent Perspective Learning (DLPL), for latent multi-perspective fusion learning using conventional single-view images. DLPL comprises three main modules: Perspective Discrete Decomposition (PDD), Perspective Homography Transformation (PHT), and Perspective Invariant Attention (PIA), which work together to discretize visual features, transform perspectives, and fuse multi-perspective semantic information, respectively. DLPL is a universal perspective learning framework applicable to a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
