Rotary Masked Autoencoders are Versatile Learners

Uros Zivanovic; Serafina Di Gioia; Andre Scaffidi; Mart\'in de los Rios; Gabriella Contardo; Roberto Trotta

arXiv:2505.20535·cs.LG·May 13, 2026

Rotary Masked Autoencoders are Versatile Learners

Uros Zivanovic, Serafina Di Gioia, Andre Scaffidi, Mart\'in de los Rios, Gabriella Contardo, Roberto Trotta

PDF

1 Video

TL;DR

RoMAE introduces a versatile Transformer-based autoencoder utilizing Rotary Positional Embeddings for effective learning across diverse data modalities without specialized architectures.

Contribution

It extends Masked Autoencoders with Rotary Positional Embeddings to handle continuous positions, enabling cross-modal learning without time-series-specific modifications.

Findings

01

RoMAE outperforms specialized time-series models on challenging datasets.

02

RoMAE maintains MAE's performance on other modalities like images and audio.

03

Including learned embeddings in input breaks RoPE's relative position property.

Abstract

Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables interpolation and representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE's performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE's usual performance across other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Rotary Masked Autoencoders are Versatile Learners· slideslive