Scalable Object Detection in the Car Interior With Vision Foundation Models
Sebastian Schmidt, B\'alint M\'esz\'aros, Ahmet Firintepe, Stephan G\"unnemann

TL;DR
The paper introduces ODAL, a distributed framework leveraging foundation models for interior vehicle object detection, overcoming resource constraints, and benchmarks it with a new ODALbench metric.
Contribution
It proposes a novel distributed architecture for interior scene understanding using foundation models and introduces ODALbench for comprehensive performance assessment.
Findings
Fine-tuned ODAL-LLaVA achieves 89% ODAL score, a 71% improvement.
ODAL-LLaVA outperforms GPT-4o by nearly 20% in ODAL score.
Fine-tuning reduces hallucinations and maintains high detection accuracy.
Abstract
AI tasks in the car interior like identifying and localizing externally introduced objects is crucial for response quality of personal assistants. However, computational resources of on-board systems remain highly constrained, restricting the deployment of such solutions directly within the vehicle. To address this limitation, we propose the novel Object Detection and Localization (ODAL) framework for interior scene understanding. Our approach leverages vision foundation models through a distributed architecture, splitting computational tasks between on-board and cloud. This design overcomes the resource constraints of running foundation models directly in the car. To benchmark model performance, we introduce ODALbench, a new metric for comprehensive assessment of detection and localization.Our analysis demonstrates the framework's potential to establish new standards in this domain. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
