POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

Yikun Liu; Yuan Liu; Le Tian; Xiao Zhou; Jiangchao Yao; Yanfeng Wang; Weidi Xie

arXiv:2604.14029·cs.CV·April 16, 2026

POINTS-Seeker: Towards Training a Multimodal Agentic Search Model from Scratch

Yikun Liu, Yuan Liu, Le Tian, Xiao Zhou, Jiangchao Yao, Yanfeng Wang, Weidi Xie

PDF

1 Models

TL;DR

This paper introduces POINTS-Seeker, a new multimodal agentic search model built from scratch, featuring innovative training phases and adaptive history compression to improve visual reasoning over long interactions.

Contribution

It presents Agentic Seeding for foundational training, V-Fold for history-aware compression, and the POINTS-Seeker-8B model that outperforms existing models on multiple benchmarks.

Findings

01

POINTS-Seeker-8B outperforms existing models across six benchmarks.

02

V-Fold effectively manages long-horizon interaction challenges.

03

Agentic Seeding enhances the model's ability to elicit agentic behaviors.

Abstract

While Large Multimodal Models (LMMs) demonstrate impressive visual perception, they remain epistemically constrained by their static parametric knowledge. To transcend these boundaries, multimodal search models have been adopted to actively interact with the external environment for evidence retrieval. Diverging from prevailing paradigms that merely retrofit general LMMs with search tools as modular extensions, we explore the potential of building a multimodal agentic search model from scratch. Specifically, we make the following contributions: (i) we introduce Agentic Seeding, a dedicated phase designed to weave the foundational precursors necessary for eliciting agentic behaviors; (ii) we uncover a performance bottleneck in long-horizon interactions, where the increasing volume of interaction history overwhelms the model's ability to locate ground-truth evidence. To mitigate this, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tencent/POINTS-Seeker
model· 446 dl· ♡ 8
446 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.