SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense   Prediction

Sumin Son; Hyesong Choi; Dongbo Min

arXiv:2409.02513·cs.CV·September 5, 2024

SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction

Sumin Son, Hyesong Choi, Dongbo Min

PDF

Open Access

TL;DR

SG-MIM introduces a structured knowledge guided pre-training framework that enhances dense prediction tasks like depth estimation and segmentation by effectively integrating structured knowledge without extra annotations.

Contribution

It proposes a lightweight relational guidance framework and a selective masking strategy to incorporate structured knowledge into masked image modeling for dense prediction tasks.

Findings

01

Outperforms existing methods on KITTI, NYU-v2, and ADE20k datasets.

02

Improves monocular depth estimation accuracy.

03

Enhances semantic segmentation performance.

Abstract

Masked Image Modeling (MIM) techniques have redefined the landscape of computer vision, enabling pre-trained models to achieve exceptional performance across a broad spectrum of tasks. Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped. Existing MIM approaches primarily rely on single-image inputs, which makes it challenging to capture the crucial structured information, leading to suboptimal performance in tasks requiring fine-grained feature representation. To address these limitations, we propose SG-MIM, a novel Structured knowledge Guided Masked Image Modeling framework designed to enhance dense prediction tasks by utilizing structured knowledge alongside images. SG-MIM employs a lightweight relational guidance framework, allowing it to guide structured knowledge individually at the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsMutual Information Machine/Mask Image Modeling