An Application-Agnostic Automatic Target Recognition System Using Vision Language Models
Anthony Palladino, Dana Gajewski, Abigail Aronica, Patryk Deptula,, Alexander Hamme, Seiyoung C. Lee, Jeff Muri, Todd Nelling, Michael A. Riley,, Brian Wong, Margaret Duff

TL;DR
This paper introduces a flexible, open-vocabulary ATR system leveraging vision-language models, allowing non-technical users to define target classes at runtime using natural language or images, with enhanced performance techniques.
Contribution
The novel ATR system enables target definition at runtime through natural language or images and integrates multiple techniques for improved detection and visualization.
Findings
Effective detection of targets with minimal training data
Enhanced performance through tubelet identification and linking
Visualizations include mosaics and heatmaps of detected areas
Abstract
We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models. A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user, using either a few natural language text descriptions of the target, or a few image exemplars, or both. Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data. We also implemented a novel combination of several techniques to improve performance, such as leveraging the additional information in the sequence of overlapping frames to perform tubelet identification (i.e., sequential bounding box matching), bounding box re-scoring, and tubelet linking. Additionally, we developed a technique to visualize the aggregate output of many overlapping frames as a mosaic of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsRobotics and Automated Systems · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
