Masked Capsule Autoencoders

Miles Everett; Mingjun Zhong; and Georgios Leontidis

arXiv:2403.04724·cs.CV·April 21, 2025·2 cites

Masked Capsule Autoencoders

Miles Everett, Mingjun Zhong, and Georgios Leontidis

PDF

Open Access

TL;DR

This paper introduces Masked Capsule Autoencoders, a novel self-supervised pretraining method for Capsule Networks using masked image modelling, significantly improving their performance on complex, realistic datasets.

Contribution

It reformulates Capsule Networks to incorporate masked image modelling pretraining, achieving state-of-the-art results and demonstrating the benefits of self-supervised learning for capsules.

Findings

01

Capsule Networks benefit from self-supervised pretraining.

02

Achieved 9% improvement on Imagenette dataset.

03

State-of-the-art results for Capsule Networks on complex images.

Abstract

We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a modern self-supervised paradigm, specifically the masked image modelling framework. Capsule Networks have emerged as a powerful alternative to Convolutional Neural Networks (CNNs). They have shown favourable properties when compared to Vision Transformers (ViT), but have struggled to effectively learn when presented with more complex data. This has led to Capsule Network models that do not scale to modern tasks. Our proposed MCAE model alleviates this issue by reformulating the Capsule Network to use masked image modelling as a pretraining stage before finetuning in a supervised manner. Across several experiments and ablations studies we demonstrate that similarly to CNNs and ViTs, Capsule Networks can also benefit from self-supervised pretraining, paving the way for further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques

MethodsCapsule Network