PhiNet v2: A Mask-Free Brain-Inspired Vision Foundation Model from Video

Makoto Yamada; Kian Ming A. Chai; Ayoub Rhim; Satoki Ishikawa; Mohammad Sabokrou; Yao-Hung Hubert Tsai

arXiv:2505.11129·cs.CV·May 19, 2025

PhiNet v2: A Mask-Free Brain-Inspired Vision Foundation Model from Video

Makoto Yamada, Kian Ming A. Chai, Ayoub Rhim, Satoki Ishikawa, Mohammad Sabokrou, Yao-Hung Hubert Tsai

PDF

Open Access 1 Repo

TL;DR

PhiNet v2 introduces a brain-inspired, Transformer-based SSL model that processes video sequences without heavy augmentation, achieving competitive performance and aligning more closely with human visual cognition.

Contribution

It presents PhiNet v2, a novel Transformer architecture that learns from sequential visual data using variational inference, advancing biologically plausible vision models.

Findings

01

Achieves competitive accuracy with state-of-the-art models

02

Learns effectively from raw video sequences without strong augmentation

03

Aligns visual processing more closely with human cognition

Abstract

Recent advances in self-supervised learning (SSL) have revolutionized computer vision through innovative architectures and learning objectives, yet they have not fully leveraged insights from biological visual processing systems. Recently, a brain-inspired SSL model named PhiNet was proposed; it is based on a ResNet backbone and operates on static image inputs with strong augmentation. In this paper, we introduce PhiNet v2, a novel Transformer-based architecture that processes temporal visual input (that is, sequences of images) without relying on strong augmentation. Our model leverages variational inference to learn robust visual representations from continuous input streams, similar to human visual processing. Through extensive experimentation, we demonstrate that PhiNet v2 achieves competitive performance compared to state-of-the-art vision foundation models, while maintaining the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oist/phinetv2
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · EEG and Brain-Computer Interfaces

MethodsAverage Pooling · Convolution · Global Average Pooling · Kaiming Initialization · Variational Inference · Max Pooling