# Toward general object search in open reality

**Authors:** Gang Shen, Wenjun Ma, Guangyao Chen, Yonghong Tian

PMC · DOI: 10.1038/s41598-025-97251-5 · Scientific Reports · 2025-04-19

## TL;DR

This paper introduces a new computer vision task and method for identifying general objects in open-world scenarios, addressing challenges like scale variance and unknown categories.

## Contribution

The paper proposes the GOSO task and introduces SEA-Net, a novel architecture for open-world object search with scale adaptation and robust matching.

## Key findings

- SEA-Net effectively handles scale variance and unknown categories in open-world object search.
- The proposed method outperforms existing approaches on newly constructed GOSO benchmarks.
- The OSF module improves matching robustness in open-world scenarios.

## Abstract

Real-world scenarios are inherently dynamic and open-ended, necessitating that current deep models adapt to general objects in open realities to be practically useful. In this paper, we extend a valuable computer vision task called General Object Search in Open Reality (GOSO). The main objective of GOSO is to determine whether an object from the open world appears in another gallery image, even when composed of arbitrary entities and backgrounds. However, two significant challenges arise: the high scale variance among different instances of the same entity and the vast openness with an ever-expanding set of unknown categories in the open world. To address these issues, we formalize the GOSO problem and propose a simple yet effective architecture named Siamese Exchanged Attention Network (SEA-Net). Specifically, based on a standard siamese structure, SEA-Net introduces a novel branch that comprises multiple stage-stacked Siamese Exchanged Attention (SEA) layers followed by a Hierarchical Feature Fusion (HFF) module, enabling efficient scale adaptation and the extraction of matching-friendly deep features. Moreover, an Open Score Fusion (OSF) module is integrated into SEA-Net during inference to yield a more robust matching score in open-world scenarios. We construct two new evaluation benchmarks suitable for the GOSO task using the existing COCO and LVIS datasets, and extensive experiments consistently demonstrate the effectiveness of the proposed method.

## Full-text entities

- **Diseases:** OSF (MESH:D000069337)
- **Chemicals:** DeiT-B (-)
- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12009333/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12009333/full.md

## References

56 references — full list in the complete paper: https://tomesphere.com/paper/PMC12009333/full.md

---
Source: https://tomesphere.com/paper/PMC12009333