Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments

Dikshant Sagar; Kaiwen Yu; Alejandro Yankelevich; Jianming Bian; Pierre Baldi

arXiv:2508.19376·cs.LG·August 28, 2025

Fine-Tuning Vision-Language Models for Neutrino Event Analysis in High-Energy Physics Experiments

Dikshant Sagar, Kaiwen Yu, Alejandro Yankelevich, Jianming Bian, Pierre Baldi

PDF

TL;DR

This paper demonstrates that fine-tuned vision-language models can effectively classify neutrino interactions from detector images, outperforming traditional CNNs and enabling richer multimodal reasoning in high-energy physics experiments.

Contribution

It introduces a novel application of LLaMA-based vision-language models for neutrino event classification, showing competitive performance and enhanced reasoning capabilities over CNN baselines.

Findings

01

VLM matches or exceeds CNN accuracy

02

Enables richer multimodal reasoning

03

Improves integration of textual context

Abstract

Recent progress in large language models (LLMs) has shown strong potential for multimodal reasoning beyond natural language. In this work, we explore the use of a fine-tuned Vision-Language Model (VLM), based on LLaMA 3.2, for classifying neutrino interactions from pixelated detector images in high-energy physics (HEP) experiments. We benchmark its performance against an established CNN baseline used in experiments like NOvA and DUNE, evaluating metrics such as classification accuracy, precision, recall, and AUC-ROC. Our results show that the VLM not only matches or exceeds CNN performance but also enables richer reasoning and better integration of auxiliary textual or semantic context. These findings suggest that VLMs offer a promising general-purpose backbone for event classification in HEP, paving the way for multimodal approaches in experimental neutrino physics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.