Empowering VLMs for Few-Shot Multimodal Time Series Classification via Tailored Agentic Reasoning
Lin Li, Jiawei Huang, Qihao Quan, Dan Li, Boxin Li, Xiao Zhang, Erli Meng, Wenjie Feng, Jian Lou, See-Kiong Ng

TL;DR
This paper introduces MarsTSC, a novel multimodal time series classification framework using vision-language models with agentic reasoning, a dynamic knowledge bank, and test-time updates, achieving superior few-shot performance.
Contribution
The paper presents the first VL agentic reasoning framework for few-shot multimodal time series classification, incorporating a self-evolving knowledge bank and a test-time update strategy.
Findings
MarsTSC outperforms classical and foundation models on 12 benchmarks.
It improves few-shot classification accuracy across 6 VLM backbones.
The framework provides interpretable rationales grounded in human-readable features.
Abstract
In this paper, we propose the first VL gentic easoning framework for few-hot multimodal ime eries lassification (), which introduces a self-evolving knowledge bank as a dynamic context iteratively refined via reflective agentic reasoning. The framework comprises three collaborative roles: i) Generator conducts reliable classification via reasoning; ii) Reflector diagnoses the root causes of reasoning errors to yield discriminative insights targeting the temporal features overlooked by Generator; iii) Modifier applies verified updates to the knowledge bank to prevent context collapse. We further introduce a test-time update strategy to enable cautious, continuous knowledge bank refinement to mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
