User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

Junfeng Lin; Yanming Xiu; Maria Gorlatova

arXiv:2601.23281·cs.CV·February 2, 2026

User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments

Junfeng Lin, Yanming Xiu, Maria Gorlatova

PDF

Open Access

TL;DR

This paper evaluates how different user prompting styles affect open-set object detection models in XR environments and proposes enhancement strategies to improve robustness against ambiguous prompts.

Contribution

It introduces an analysis of prompt-conditioned robustness in OSOD models and proposes prompt enhancement methods to improve performance under ambiguous user inputs.

Findings

01

Models perform well with underdetailed and standard prompts.

02

Performance degrades with ambiguous prompts, especially for GroundingDINO.

03

Prompt enhancement improves robustness, increasing mIoU by over 55%.

Abstract

Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning