# SOP: Selective Orthogonal Projection for Composed Image Retrieval

**Authors:** Su Cheng, Guoyang Liu

PMC · DOI: 10.3390/s26051621 · Sensors (Basel, Switzerland) · 2026-03-04

## TL;DR

This paper introduces SOP, a new method for retrieving images using complex queries that combine images and text, improving accuracy in large-scale visual data.

## Contribution

SOP introduces a geometry-based approach to reduce feature distribution shifts and semantic erosion in composed image retrieval.

## Key findings

- SOP outperforms state-of-the-art methods on FashionIQ, Shoes, and CIRR datasets.
- The Selective Focus Recovery module effectively calibrates query features to the true target distribution.
- Orthogonal Subspace Projection enhances data fidelity by decoupling visual and semantic features.

## Abstract

The proliferation of intelligent sensor networks in urban surveillance and remote sensing has triggered the explosive growth of unstructured visual sensor data. Accurately retrieving targets from these massive streams based on complex cross-modal user intents remains a critical bottleneck for efficient intelligent perception. Composed Image Retrieval (CIR) addresses this by enabling retrieval via a multi-modal query that combines a reference image with semantic control signals. However, existing methods often struggle with abstract instructions in real-world scenarios. Consequently, models often suffer from feature distribution shifts due to focus ambiguity, as well as semantic erosion caused by highly entangled visual and textual features. To address these challenges, we propose a geometry-based Selective Orthogonal Projection Network (SOP). First, the Selective Focus Recovery module quantifies instruction uncertainty via information entropy and calibrates shifted query features to the true target distribution using structural consistency regularization. Second, to ensure data fidelity, we introduce Orthogonal Subspace Projectionand Geometric Composition Fidelity. These mechanisms employ Gram–Schmidt orthogonalization to decouple features into a constant visual base and an orthogonal modification increment, restricting semantic modifications to the null space. Extensive experiments on FashionIQ, Shoes, and CIRR datasets demonstrate that SOP significantly outperforms SOTA methods, offering a novel solution for efficient large-scale sensor data retrieval and analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12986979/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12986979/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC12986979/full.md

---
Source: https://tomesphere.com/paper/PMC12986979