User Prompting Strategies and Prompt Enhancement Methods for Open-Set Object Detection in XR Environments
Junfeng Lin, Yanming Xiu, Maria Gorlatova

TL;DR
This paper evaluates how different user prompting styles affect open-set object detection models in XR environments and proposes enhancement strategies to improve robustness against ambiguous prompts.
Contribution
It introduces an analysis of prompt-conditioned robustness in OSOD models and proposes prompt enhancement methods to improve performance under ambiguous user inputs.
Findings
Models perform well with underdetailed and standard prompts.
Performance degrades with ambiguous prompts, especially for GroundingDINO.
Prompt enhancement improves robustness, increasing mIoU by over 55%.
Abstract
Open-set object detection (OSOD) localizes objects while identifying and rejecting unknown classes at inference. While recent OSOD models perform well on benchmarks, their behavior under realistic user prompting remains underexplored. In interactive XR settings, user-generated prompts are often ambiguous, underspecified, or overly detailed. To study prompt-conditioned robustness, we evaluate two OSOD models, GroundingDINO and YOLO-E, on real-world XR images and simulate diverse user prompting behaviors using vision-language models. We consider four prompt types: standard, underdetailed, overdetailed, and pragmatically ambiguous, and examine the impact of two enhancement strategies on these prompts. Results show that both models exhibit stable performance under underdetailed and standard prompts, while they suffer degradation under ambiguous prompts. Overdetailed prompts primarily affect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
