Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance

Hamed Ouattara; Pierre Duthon; Pascal Houssam Salmane; Fr\'ed\'eric Bernardin; Omar Ait Aider

arXiv:2604.16086·cs.CV·April 20, 2026

Stylistic-STORM (ST-STORM) : Perceiving the Semantic Nature of Appearance

Hamed Ouattara, Pierre Duthon, Pascal Houssam Salmane, Fr\'ed\'eric Bernardin, Omar Ait Aider

PDF

TL;DR

ST-STORM is a hybrid self-supervised learning framework that disentangles appearance and content, improving semantic robustness and appearance understanding in tasks like weather analysis and medical imaging.

Contribution

It introduces a dual-stream architecture with gating mechanisms to separately learn appearance signatures and invariant semantic representations.

Findings

01

Style branch captures complex appearance phenomena with high F1 scores (97% on Multi-Weather, 94% on ISIC 2024).

02

Content branch maintains semantic performance with 80% F1 on ImageNet-1K.

03

Disentangling appearance improves critical task performance without degrading semantic accuracy.

Abstract

One of the dominant paradigms in self-supervised learning (SSL), illustrated by MoCo or DINO, aims to produce robust representations by capturing features that are insensitive to certain image transformations such as illumination, or geometric changes. This strategy is appropriate when the objective is to recognize objects independently of their appearance. However, it becomes counterproductive as soon as appearance itself constitutes the discriminative signal. In weather analysis, for example, rain streaks, snow granularity, atmospheric scattering, as well as reflections and halos, are not noise: they carry the essential information. In critical applications such as autonomous driving, ignoring these cues is risky, since grip and visibility depend directly on ground conditions and atmospheric conditions. We introduce ST-STORM, a hybrid SSL framework that treats appearance (style) as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.