Thoughts on Objectives of Sparse and Hierarchical Masked Image Model

Asahi Miyazaki; Tsuyoshi Okita

arXiv:2505.08819·eess.IV·May 15, 2025

Thoughts on Objectives of Sparse and Hierarchical Masked Image Model

Asahi Miyazaki, Tsuyoshi Okita

PDF

Open Access

TL;DR

This paper investigates the impact of different mask patterns on the performance of the SparK masked image model, proposing a new Mesh Mask-ed SparK variant to improve self-supervised learning outcomes.

Contribution

It introduces a novel mask pattern for SparK, called Mesh Mask, and analyzes its effects on pre-training performance in masked image modeling.

Findings

01

Mesh Mask improves pre-training effectiveness

02

Mask pattern choice significantly affects model performance

03

Proposed pattern outperforms previous masking strategies

Abstract

Masked image modeling is one of the most poplular objectives of training. Recently, the SparK model has been proposed with superior performance among self-supervised learning models. This paper proposes a new mask pattern for this SparK model, proposing it as the Mesh Mask-ed SparK model. We report the effect of the mask pattern used for image masking in pre-training on performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media and Visual Art