ViT$^3$: Unlocking Test-Time Training in Vision

Dongchen Han; Yining Li; Tianyu Li; Zixuan Cao; Ziming Wang; Jun Song; Yu Cheng; Bo Zheng; Gao Huang

arXiv:2512.01643·cs.CV·April 21, 2026

ViT$^3$: Unlocking Test-Time Training in Vision

Dongchen Han, Yining Li, Tianyu Li, Zixuan Cao, Ziming Wang, Jun Song, Yu Cheng, Bo Zheng, Gao Huang

PDF

1 Repo

TL;DR

This paper systematically studies design choices for visual Test-Time Training (TTT), introduces the ViT$^3$ model with linear complexity, and demonstrates its effectiveness across various visual tasks.

Contribution

It provides empirical insights and guidelines for designing effective visual TTT models, culminating in the ViT$^3$ architecture with state-of-the-art performance.

Findings

01

ViT$^3$ achieves linear complexity and parallelizable computation.

02

ViT$^3$ matches or outperforms existing linear models on multiple tasks.

03

The study offers practical design principles for visual TTT models.

Abstract

Test-Time Training (TTT) has recently emerged as a promising direction for efficient sequence modeling. TTT reformulates attention operation as an online learning problem, constructing a compact inner model from key-value pairs at test time. This reformulation opens a rich and flexible design space while achieving linear computational complexity. However, crafting a powerful visual TTT design remains challenging: fundamental choices for the inner module and inner training lack comprehensive understanding and practical guidelines. To bridge this critical gap, in this paper, we present a systematic empirical study of TTT designs for visual sequence modeling. From a series of experiments and analyses, we distill six practical insights that establish design principles for effective visual TTT and illuminate paths for future improvement. These findings culminate in the Vision Test-Time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LeapLabTHU/ViTTT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.