Frame2Freq: Spectral Adapters for Fine-Grained Video Understanding

Thinesh Thiyakesan Ponbagavathi; Constantin Seibold; Alina Roitberg

arXiv:2602.18977·cs.CV·February 24, 2026

Frame2Freq: Spectral Adapters for Fine-Grained Video Understanding

Thinesh Thiyakesan Ponbagavathi, Constantin Seibold, Alina Roitberg

PDF

Open Access

TL;DR

Frame2Freq introduces spectral adapters using FFT to capture multi-scale temporal dynamics, significantly improving fine-grained video understanding by outperforming prior methods and even fully fine-tuned models.

Contribution

It proposes a novel frequency-aware adapter that encodes spectral information during image-to-video transfer, enhancing temporal analysis in pretrained vision models.

Findings

01

Outperforms prior PEFT methods on five datasets

02

Surpasses fully fine-tuned models on four datasets

03

Demonstrates effectiveness of frequency analysis in temporal modeling

Abstract

Adapting image-pretrained backbones to video typically relies on time-domain adapters tuned to a single temporal scale. Our experiments show that these modules pick up static image cues and very fast flicker changes, while overlooking medium-speed motion. Capturing dynamics across multiple time-scales is, however, crucial for fine-grained temporal analysis (i.e., opening vs. closing bottle). To address this, we introduce Frame2Freq -- a family of frequency-aware adapters that perform spectral encoding during image-to-video adaptation of pretrained Vision Foundation Models (VFMs), improving fine-grained action recognition. Frame2Freq uses Fast Fourier Transform (FFT) along time and learns frequency-band specific embeddings that adaptively highlight the most discriminative frequency ranges. Across five fine-grained activity recognition datasets, Frame2Freq outperforms prior PEFT methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Domain Adaptation and Few-Shot Learning