On the Pros and Cons of Momentum Encoder in Self-Supervised Visual   Representation Learning

Trung Pham; Chaoning Zhang; Axi Niu; Kang Zhang; Chang D. Yoo

arXiv:2208.05744·cs.CV·August 12, 2022·5 cites

On the Pros and Cons of Momentum Encoder in Self-Supervised Visual Representation Learning

Trung Pham, Chaoning Zhang, Axi Niu, Kang Zhang, Chang D. Yoo

PDF

Open Access

TL;DR

This paper investigates the role of momentum (EMA) in self-supervised learning, revealing that applying momentum to the encoder's final layers or projector can improve stability and performance without increasing computation.

Contribution

It is the first to analyze how EMA affects different parts of the encoder and proposes a projector-only momentum approach for efficiency and effectiveness.

Findings

01

EMA's benefit is mainly due to stability effects.

02

Applying EMA to the projector yields comparable or better results.

03

Projector-only momentum reduces computation while maintaining performance.

Abstract

Exponential Moving Average (EMA or momentum) is widely used in modern self-supervised learning (SSL) approaches, such as MoCo, for enhancing performance. We demonstrate that such momentum can also be plugged into momentum-free SSL frameworks, such as SimCLR, for a performance boost. Despite its wide use as a fundamental component in modern SSL frameworks, the benefit caused by momentum is not well understood. We find that its success can be at least partly attributed to the stability effect. In the first attempt, we analyze how EMA affects each part of the encoder and reveal that the portion near the encoder's input plays an insignificant role while the latter parts have much more influence. By monitoring the gradient of the overall loss with respect to the output of each block in the encoder, we observe that the final layers tend to fluctuate much more than other layers during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Imaging for Blood Diseases · Advanced Vision and Imaging · Domain Adaptation and Few-Shot Learning

MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Bottleneck Residual Block · Convolution · Kaiming Initialization · Residual Connection · Dense Connections · Feedforward Network · Average Pooling