Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance
Hamed Ouattara, Pierre Duthon, Pascal Houssam Salmane, Fr\'ed\'eric Bernardin, Omar Ait Aider

TL;DR
ST-STORM is a hybrid self-supervised learning framework that disentangles appearance and content, improving semantic robustness and appearance understanding in tasks like weather analysis and medical imaging.
Contribution
It introduces a dual-stream architecture with gating mechanisms to separately learn appearance signatures and invariant semantic representations.
Findings
Style branch captures complex appearance phenomena with high F1 scores (97% on Multi-Weather, 94% on ISIC 2024).
Content branch maintains semantic performance with 80% F1 on ImageNet-1K.
Disentangling appearance improves critical task performance without degrading semantic accuracy.
Abstract
One of the dominant paradigms in self-supervised learning (SSL), illustrated by MoCo or DINO, aims to produce robust representations by capturing features that are insensitive to certain image transformations such as illumination, or geometric changes. This strategy is appropriate when the objective is to recognize objects independently of their appearance. However, it becomes counterproductive as soon as appearance itself constitutes the discriminative signal. In weather analysis, for example, rain streaks, snow granularity, atmospheric scattering, as well as reflections and halos, are not noise: they carry the essential information. In critical applications such as autonomous driving, ignoring these cues is risky, since grip and visibility depend directly on ground conditions and atmospheric conditions. We introduce ST-STORM, a hybrid SSL framework that treats appearance (style) as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
