Improving Joint Embedding Predictive Architecture with Diffusion Noise
Yuping Qiu, Rui Zhu, Ying-cong Chen

TL;DR
This paper introduces N-JEPA, a novel self-supervised learning method that integrates diffusion noise with masked image modeling to improve image classification performance.
Contribution
It proposes a new approach combining diffusion noise with masked image modeling, enhancing SSL's representation capacity for recognition tasks.
Findings
Improved downstream classification accuracy
Enhanced robustness through multi-level noise scheduling
Effective integration of diffusion noise with masked image modeling
Abstract
Self-supervised learning has become an incredibly successful method for feature learning, widely applied to many downstream tasks. It has proven especially effective for discriminative tasks, surpassing the trending generative models. However, generative models perform better in image generation and detail enhancement. Thus, it is natural for us to find a connection between SSL and generative models to further enhance the representation capacity of SSL. As generative models can create new samples by approximating the data distribution, such modeling should also lead to a semantic understanding of the raw visual data, which is necessary for recognition tasks. This enlightens us to combine the core principle of the diffusion model: diffusion noise, with SSL to learn a competitive recognition model. Specifically, diffusion noise can be viewed as a particular state of mask that reveals a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
