Text-guided Generation of Efficient Personalized Inspection Plans
Xingpeng Sun, Zherong Pan, Xifeng Gao, Kui Wu, Aniket Bera

TL;DR
This paper introduces a training-free, vision-language model-guided method for generating efficient, personalized inspection trajectories based on text descriptions, applicable to known environments in various engineering fields.
Contribution
It presents a novel, training-free approach that leverages vision-language models to extract POIs, refine trajectories, and solve TSP for inspection planning in known scenes.
Findings
Effective trajectory generation in real-world environments
Adheres to user instructions for inspection tasks
Applicable to aerial and underwater vehicles
Abstract
We propose a training-free, Vision-Language Model (VLM)-guided approach for efficiently generating trajectories to facilitate target inspection planning based on text descriptions. Unlike existing Vision-and-Language Navigation (VLN) methods designed for general agents in unknown environments, our approach specifically targets the efficient inspection of known scenes, with widespread applications in fields such as medical, marine, and civil engineering. Leveraging VLMs, our method first extracts points of interest (POIs) from the text description, then identifies a set of waypoints from which POIs are both salient and align with the spatial constraints defined in the prompt. Next, we interact with the VLM to iteratively refine the trajectory, preserving the visibility and prominence of the POIs. Further, we solve a Traveling Salesman Problem (TSP) to find the most efficient visitation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Web Applications and Data Management · Model-Driven Software Engineering Techniques
