Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

Lin Li; Jiawei Huang; Qihao Quan; Dan Li; Boxin Li; Xiao Zhang; Erli Meng; Wenjie Feng; Jian Lou; See-Kiong Ng

arXiv:2605.09395·cs.AI·May 19, 2026

Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning

Lin Li, Jiawei Huang, Qihao Quan, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Wenjie Feng, Jian Lou, See-Kiong Ng

PDF

TL;DR

This paper introduces MarsTSC, a novel multimodal time series classification framework using vision-language models with agentic reasoning, a dynamic knowledge bank, and test-time updates, achieving superior few-shot performance.

Contribution

The paper presents the first VL agentic reasoning framework for few-shot multimodal time series classification, incorporating a self-evolving knowledge bank and a test-time update strategy.

Findings

01

MarsTSC outperforms classical and foundation models on 12 benchmarks.

02

It improves few-shot classification accuracy across 6 VLM backbones.

03

The framework provides interpretable rationales grounded in human-readable features.

Abstract

In this paper, we propose the first VL $\underline{M}$ $\underline{a}$ gentic $\underline{r}$ easoning framework for few- $\underline{s}$ hot multimodal $\underline{T}$ ime $\underline{S}$ eries $\underline{C}$ lassification ( $MarsTSC$ ), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.