# Hyper-MDR: An open-world multimodal reasoning framework based on dynamic hypergraph and meta-strategy optimization

**Authors:** Jian Shi, Xiaobin Huang, Lianhai Yuan

PMC · DOI: 10.1371/journal.pone.0342169 · PLOS One · 2026-02-17

## TL;DR

Hyper-MDR is a new framework for open-world object detection that improves multimodal reasoning and adapts to new categories and changing environments.

## Contribution

Hyper-MDR introduces a dynamic hypergraph and meta-strategy optimization for open-world multimodal perception.

## Key findings

- Hypergraph enhancement models high-order dependencies in vision–language–semantic triplets.
- The hierarchical hypergraph convolutional network enables knowledge propagation between known and unknown categories.
- The meta-policy gradient-based controller adapts feature fusion and attention topology dynamically.

## Abstract

Open-world object detection (OWOD) has become a crucial paradigm for advancing intelligent perception systems, as it requires not only accurate recognition of known categories but also autonomous discovery and continuous learning of emerging unknown categories in dynamic environments. However, existing methods often suffer from shallow cross-modal interaction and rigid reasoning mechanisms, making them unable to cope with the continuous emergence of new categories and the dynamic changes in modality reliability in open-world environments. First, to achieve hypergraph enhancement, image regions, text, and semantic prototypes are treated as nodes, while a gating network dynamically generates hyperedges under semantic and spatial constraints, thereby modeling the high-order dependencies of vision–language–semantic triplets and enabling deep multimodal fusion at the topological level. Second, a hierarchical hypergraph convolutional network is designed to facilitate knowledge propagation between known and unknown categories. Finally, a meta-policy gradient-based adaptive controller is proposed, which dynamically adjusts feature fusion weights, propagation depth, and attention topology based on the detection state and historical trajectories. Experimental results on the OWOD dataset show that our proposed method achieves an accuracy of 76.8%, providing a new paradigm for open-world multimodal perception that integrates semantic depth and adaptability.

## Full-text entities

- **Genes:** TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}
- **Diseases:** hallucinations (MESH:D006212), Hyper (MESH:D007589), MDR (MESH:D018088)
- **Chemicals:** OWOD (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12912602/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12912602/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC12912602/full.md

---
Source: https://tomesphere.com/paper/PMC12912602