YOLOv10 with Kolmogorov-Arnold networks and vision-language foundation models for interpretable object detection and trustworthy multimodal AI in computer vision perception

Marios Impraimakis; Daniel Vazquez; and Feiyu Zhou

arXiv:2603.23037·cs.CV·March 25, 2026

YOLOv10 with Kolmogorov-Arnold networks and vision-language foundation models for interpretable object detection and trustworthy multimodal AI in computer vision perception

Marios Impraimakis, Daniel Vazquez, and Feiyu Zhou

PDF

Open Access

TL;DR

This paper introduces a novel interpretable object detection framework combining Kolmogorov-Arnold networks with YOLOv10 and a multimodal foundation model, enhancing transparency and trustworthiness in autonomous vision systems.

Contribution

It develops a post-hoc surrogate model for trust estimation in object detection, providing visual interpretability and reliable confidence scores in complex scenes.

Findings

01

Accurately identifies low-trust predictions under challenging conditions.

02

Enables visualization of feature influence on confidence scores.

03

Integrates multimodal captions for scene understanding without compromising interpretability.

Abstract

The interpretable object detection capabilities of a novel Kolmogorov-Arnold network framework are examined here. The approach refers to a key limitation in computer vision for autonomous vehicles perception, and beyond. These systems offer limited transparency regarding the reliability of their confidence scores in visually degraded or ambiguous scenes. To address this limitation, a Kolmogorov-Arnold network is employed as an interpretable post-hoc surrogate to model the trustworthiness of the You Only Look Once (Yolov10) detections using seven geometric and semantic features. The additive spline-based structure of the Kolmogorov-Arnold network enables direct visualisation of each feature's influence. This produces smooth and transparent functional mappings that reveal when the model's confidence is well supported and when it is unreliable. Experiments on both Common Objects in Context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning