MSDS: Deep Structural Similarity with Multiscale Representation

Danling Kang; Xue-Hua Chen; Bin Liu; Keke Zhang; Weiling Chen; Tiesong Zhao

arXiv:2604.19159·cs.CV·April 22, 2026

MSDS: Deep Structural Similarity with Multiscale Representation

Danling Kang, Xue-Hua Chen, Bin Liu, Keke Zhang, Weiling Chen, Tiesong Zhao

PDF

TL;DR

This paper introduces MSDS, a multiscale extension of DeepSSIM, which improves deep perceptual similarity modeling by explicitly incorporating spatial scale, leading to better alignment with human visual perception.

Contribution

The paper presents a minimal multiscale framework that isolates and demonstrates the importance of spatial scale in deep-feature-based perceptual similarity models.

Findings

01

MSDS outperforms single-scale DeepSSIM on benchmark datasets.

02

Incorporating multiscale information yields statistically significant improvements.

03

The approach introduces negligible additional complexity.

Abstract

Deep-feature-based perceptual similarity models have demonstrated strong alignment with human visual perception in Image Quality Assessment (IQA). However, most existing approaches operate at a single spatial scale, implicitly assuming that structural similarity at a fixed resolution is sufficient. The role of spatial scale in deep-feature similarity modeling thus remains insufficiently understood. In this letter, we isolate spatial scale as an independent factor using a minimal multiscale extension of DeepSSIM, referred to as Deep Structural Similarity with Multiscale Representation (MSDS). The proposed framework decouples deep feature representation from cross-scale integration by computing DeepSSIM independently across pyramid levels and fusing the resulting scores with a lightweight set of learnable global weights. Experiments on multiple benchmark datasets demonstrate consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.