# Pairwise Comparison-Based Salient Object Ranking Using Multimodal Large Models

**Authors:** Yifan Liu, Jia Song, Chenglizhao Chen

PMC · DOI: 10.3390/s26061913 · Sensors (Basel, Switzerland) · 2026-03-18

## TL;DR

This paper introduces a new method for ranking the importance of objects in images using pairwise comparisons and multimodal large models, improving accuracy in complex scenes.

## Contribution

A novel framework called PairwiseSOR-MLMs that uses pairwise comparisons and multimodal large models to enhance salient object ranking in complex scenes.

## Key findings

- PairwiseSOR-MLMs achieves state-of-the-art performance on the ASSR and IRSR benchmarks.
- The method improves robustness in handling occlusion and semantic similarity among objects.
- The pairwise comparison approach reduces complexity in image feature extraction.

## Abstract

What are the main findings?
By using pairwise comparison, the problems that exist in global significance object sorting can be effectively solved.Multimodal Large Models can assist in salient object ranking.

By using pairwise comparison, the problems that exist in global significance object sorting can be effectively solved.

Multimodal Large Models can assist in salient object ranking.

What are the implications of the main findings?
Improving the effectiveness of salient object ranking in complex scenes.Reducing the complexity of image feature extraction.

Improving the effectiveness of salient object ranking in complex scenes.

Reducing the complexity of image feature extraction.

Salient object ranking aims to assign a relative importance order to multiple objects in an image, aligning with human visual attention. However, existing methods struggle with ranking ambiguity in complex scenes, particularly when objects are numerous, occluded, or semantically similar, leading to decreased accuracy for low-saliency objects. To address this, we propose PairwiseSOR-MLMs, a novel framework leveraging multimodal large models and pairwise comparison to achieve salient object ranking. The approach decomposes global ranking into a series of pairwise comparison tasks. It first employs object detection and instance segmentation to identify objects, uses image inpainting to reconstruct scenes by removing occlusions, and then prompts MLMs to perform pairwise comparisons based on visual saliency cues. Finally, another MLM inference aggregates these comparisons into a consistent global ranking. Experiments on ASSR and IRSR benchmarks show our method achieves state-of-the-art or competitive performance across metrics, demonstrating robustness in handling occlusion and semantic similarity. Its pairwise comparison paradigm can extend to other relative assessment tasks.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13029977/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13029977/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC13029977/full.md

---
Source: https://tomesphere.com/paper/PMC13029977