SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction
Sumin Son, Hyesong Choi, Dongbo Min

TL;DR
SG-MIM introduces a structured knowledge guided pre-training framework that enhances dense prediction tasks like depth estimation and segmentation by effectively integrating structured knowledge without extra annotations.
Contribution
It proposes a lightweight relational guidance framework and a selective masking strategy to incorporate structured knowledge into masked image modeling for dense prediction tasks.
Findings
Outperforms existing methods on KITTI, NYU-v2, and ADE20k datasets.
Improves monocular depth estimation accuracy.
Enhances semantic segmentation performance.
Abstract
Masked Image Modeling (MIM) techniques have redefined the landscape of computer vision, enabling pre-trained models to achieve exceptional performance across a broad spectrum of tasks. Despite their success, the full potential of MIM-based methods in dense prediction tasks, particularly in depth estimation, remains untapped. Existing MIM approaches primarily rely on single-image inputs, which makes it challenging to capture the crucial structured information, leading to suboptimal performance in tasks requiring fine-grained feature representation. To address these limitations, we propose SG-MIM, a novel Structured knowledge Guided Masked Image Modeling framework designed to enhance dense prediction tasks by utilizing structured knowledge alongside images. SG-MIM employs a lightweight relational guidance framework, allowing it to guide structured knowledge individually at the feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsMutual Information Machine/Mask Image Modeling
