TL;DR
RoMAE introduces a versatile Transformer-based autoencoder utilizing Rotary Positional Embeddings for effective learning across diverse data modalities without specialized architectures.
Contribution
It extends Masked Autoencoders with Rotary Positional Embeddings to handle continuous positions, enabling cross-modal learning without time-series-specific modifications.
Findings
RoMAE outperforms specialized time-series models on challenging datasets.
RoMAE maintains MAE's performance on other modalities like images and audio.
Including learned embeddings in input breaks RoPE's relative position property.
Abstract
Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables interpolation and representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE's performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE's usual performance across other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
