AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

Kohei Yamamoto; Kosuke Okusa

arXiv:2512.03637·cs.SD·May 15, 2026

AaSP: Aliasing-aware Self-Supervised Pre-Training for Audio Spectrogram Transformers

Kohei Yamamoto, Kosuke Okusa

PDF

TL;DR

AaSP introduces an aliasing-aware pre-training framework for audio spectrogram transformers, improving stability and performance by adaptively analyzing subbands and integrating alias-prone features.

Contribution

The paper proposes AaSP, a novel aliasing-aware self-supervised learning method that enhances audio spectrogram transformer representations by adaptive subband analysis and aliasing mitigation.

Findings

01

Achieves state-of-the-art results on AS-20K, ESC-50, and NSynth benchmarks.

02

Learns more stable representations under aliasing-sensitive temporal perturbations.

03

Shows competitive performance on various audio recognition tasks.

Abstract

Transformer-based audio self-supervised learning (SSL) models commonly use spectrograms, vision-style Transformers, and masked modeling objectives. However, convolutional patchification with temporal downsampling lowers the effective Nyquist frequency and introduces aliasing, while na\"ive low-pass filtering may remove task-relevant high-frequency cues. We present AaSP, an aliasing-aware self-supervised pre-training framework for audio spectrogram transformers. AaSP combines an aliasing-aware patch representation, teacher-student masked modeling, a cross-attention predictor, and multi-mask contrastive regularization to learn representations that integrate features from alias-prone modulation bands while remaining stable across masked views. Its patch-embedding module, Aliasing-aware Patch Embedding (AaPE), augments standard patch tokens with features from alias-prone modulation bands…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.