Less is More in Semantic Space: Intrinsic Decoupling via Clifford-M for Fundus Image Classification
Yifeng Zheng

TL;DR
This paper introduces Clifford-M, a lightweight geometric model that efficiently captures multi-scale structures in fundus images, outperforming larger CNNs without explicit frequency decomposition.
Contribution
Proposes Clifford-M, a novel geometric interaction backbone that replaces frequency-splitting modules, enabling efficient multi-scale feature fusion in fundus image classification.
Findings
Clifford-M achieves 0.8142 mean AUC-ROC on ODIR-5K without pre-training.
It attains 0.7425 macro AUC on RFMiD without fine-tuning.
Outperforms larger CNN baselines with fewer parameters.
Abstract
Multi-label fundus diagnosis requires features that capture both fine-grained lesions and large-scale retinal structure. Many multi-scale medical vision models address this challenge through explicit frequency decomposition, but our ablation studies show that such heuristics provide limited benefit in this setting: replacing the proposed simple dual-resolution stem with Octave Convolution increased parameters by 35% and computation by a 2.23-fold increase in computation; without improving mean accuracy, while a fixed wavelet-based variant performed substantially worse. Motivated by these findings, we propose Clifford-M, a lightweight backbone that replaces both feed-forward expansion and frequency-splitting modules with sparse geometric interaction. The model is built on a Clifford-style rolling product that jointly captures alignment and structural variation with linear complexity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
