ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer
Youwei Pang, Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

TL;DR
ComPtr introduces a simple, general transformer framework for bi-source dense prediction tasks, effectively integrating diverse visual cues and achieving strong performance across multiple vision applications.
Contribution
It proposes a task-generic bi-source transformer with consistency and difference modules, enabling unified dense prediction across various vision tasks.
Findings
Consistently outperforms existing methods on multiple benchmarks.
Effectively integrates multi-source visual information.
Demonstrates versatility across diverse dense prediction tasks.
Abstract
Deep learning (DL) has advanced the field of dense prediction, while gradually dissolving the inherent barriers between different tasks. However, most existing works focus on designing architectures and constructing visual cues only for the specific task, which ignores the potential uniformity introduced by the DL paradigm. In this paper, we attempt to construct a novel lementary ansformer, , for diverse bi-source dense prediction tasks. Specifically, unlike existing methods that over-specialize in a single task or a subset of tasks, ComPtr starts from the more general concept of bi-source dense prediction. Based on the basic dependence on information complementarity, we propose consistency enhancement and difference awareness components with which ComPtr can evacuate and collect important visual semantic cues from different image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Remote-Sensing Image Classification · Visual Attention and Saliency Detection
MethodsFocus
